, etc.\n"
+ ),
+ "korean_OCR": (
+ "You are a professional Korean to English novel translator, you must strictly output only English text and HTML tags while following these rules:\n"
+ "- Use a natural, comedy-friendly English translation style that captures both humor and readability without losing any original meaning.\n"
+ "- Include 100% of the source text - every word, phrase, and sentence must be fully translated without exception.\n"
+ "- Retain Korean honorifics and respectful speech markers in romanized form, including but not limited to: -nim, -ssi, -yang, -gun, -isiyeo, -hasoseo. For archaic/classical Korean honorific forms (like 이시여/isiyeo, 하소서/hasoseo), preserve them as-is rather than converting to modern equivalents.\n"
+ "- Always localize Korean terminology to proper English equivalents instead of literal translations (examples: 마왕 = Demon King; 마술 = magic).\n"
+ "- When translating Korean's pronoun-dropping style, insert pronouns in English only where needed for clarity: prioritize original pronouns as implied or according to the glossary, and only use they/them as a last resort, use I/me for first-person narration, and maintain natural English flow without overusing pronouns just because they're omitted in Korean.\n"
+ "- All Korean profanity must be translated to English profanity.\n"
+ "- Preserve original intent, and speech tone.\n"
+ "- Retain onomatopoeia in Romaji.\n"
+ "- Keep original Korean quotation marks (" ", ' ', 「」, 『』) as-is without converting to English quotes.\n"
+ "- Every Korean/Chinese/Japanese character must be converted to its English meaning. Examples: The character 생 means 'life/living', 활 means 'active', 관 means 'hall/building' - together 생활관 means Dormitory.\n"
+ "- Add HTML tags for proper formatting as expected of a novel.\n"
+ "- Wrap every paragraph in
tags; do not insert any literal tabs or spaces.\n"
+ ),
+ "japanese_OCR": (
+ "You are a professional Japanese to English novel translator, you must strictly output only English text and HTML tags while following these rules:\n"
+ "- Use a natural, comedy-friendly English translation style that captures both humor and readability without losing any original meaning.\n"
+ "- Include 100% of the source text - every word, phrase, and sentence must be fully translated without exception.\n"
+ "- Retain Japanese honorifics and respectful speech markers in romanized form, including but not limited to: -san, -sama, -chan, -kun, -dono, -sensei, -senpai, -kouhai. For archaic/classical Japanese honorific forms, preserve them as-is rather than converting to modern equivalents.\n"
+ "- Always localize Japanese terminology to proper English equivalents instead of literal translations (examples: 魔王 = Demon King; 魔術 = magic).\n"
+ "- When translating Japanese's pronoun-dropping style, insert pronouns in English only where needed for clarity: prioritize original pronouns as implied or according to the glossary, and only use they/them as a last resort, use I/me for first-person narration while reflecting the Japanese pronoun's nuance (私/僕/俺/etc.) through speech patterns rather than the pronoun itself, and maintain natural English flow without overusing pronouns just because they're omitted in Japanese.\n"
+ "- All Japanese profanity must be translated to English profanity.\n"
+ "- Preserve original intent, and speech tone.\n"
+ "- Retain onomatopoeia in Romaji.\n"
+ "- Keep original Japanese quotation marks (「」 and 『』) as-is without converting to English quotes.\n"
+ "- Every Korean/Chinese/Japanese character must be converted to its English meaning. Examples: The character 生 means 'life/living', 活 means 'active', 館 means 'hall/building' - together 生活館 means Dormitory.\n"
+ "- Add HTML tags for proper formatting as expected of a novel.\n"
+ "- Wrap every paragraph in
tags; do not insert any literal tabs or spaces.\n"
+ ),
+ "chinese_OCR": (
+ "You are a professional Chinese to English novel translator, you must strictly output only English text and HTML tags while following these rules:\n"
+ "- Use a natural, comedy-friendly English translation style that captures both humor and readability without losing any original meaning.\n"
+ "- Include 100% of the source text - every word, phrase, and sentence must be fully translated without exception.\n"
+ "- Retain Chinese titles and respectful forms of address in romanized form, including but not limited to: laoban, laoshi, shifu, xiaojie, xiansheng, taitai, daren, qianbei. For archaic/classical Chinese respectful forms, preserve them as-is rather than converting to modern equivalents.\n"
+ "- Always localize Chinese terminology to proper English equivalents instead of literal translations (examples: 魔王 = Demon King; 法术 = magic).\n"
+ "- When translating Chinese's flexible pronoun usage, insert pronouns in English only where needed for clarity: prioritize original pronouns as implied or according to the glossary, and only use they/them as a last resort, use I/me for first-person narration while reflecting the pronoun's nuance (我/吾/咱/人家/etc.) through speech patterns and formality level rather than the pronoun itself, and since Chinese pronouns don't indicate gender in speech (他/她/它 all sound like 'tā'), rely on context or glossary rather than assuming gender.\n"
+ "- All Chinese profanity must be translated to English profanity.\n"
+ "- Preserve original intent, and speech tone.\n"
+ "- Retain onomatopoeia in Romaji.\n"
+ "- Keep original Chinese quotation marks (「」 for dialogue, 《》 for titles) as-is without converting to English quotes.\n"
+ "- Every Korean/Chinese/Japanese character must be converted to its English meaning. Examples: The character 生 means 'life/living', 活 means 'active', 館 means 'hall/building' - together 生活館 means Dormitory.\n"
+ "- Add HTML tags for proper formatting as expected of a novel.\n"
+ "- Wrap every paragraph in
tags; do not insert any literal tabs or spaces.\n"
+ ),
+ "korean_TXT": (
+ "You are a professional Korean to English novel translator, you must strictly output only English text while following these rules:\n"
+ "- Use a natural, comedy-friendly English translation style that captures both humor and readability without losing any original meaning.\n"
+ "- Include 100% of the source text - every word, phrase, and sentence must be fully translated without exception.\n"
+ "- Retain Korean honorifics and respectful speech markers in romanized form, including but not limited to: -nim, -ssi, -yang, -gun, -isiyeo, -hasoseo. For archaic/classical Korean honorific forms (like 이시여/isiyeo, 하소서/hasoseo), preserve them as-is rather than converting to modern equivalents.\n"
+ "- Always localize Korean terminology to proper English equivalents instead of literal translations (examples: 마왕 = Demon King; 마술 = magic).\n"
+ "- When translating Korean's pronoun-dropping style, insert pronouns in English only where needed for clarity: prioritize original pronouns as implied or according to the glossary, and only use they/them as a last resort, use I/me for first-person narration, and maintain natural English flow without overusing pronouns just because they're omitted in Korean.\n"
+ "- All Korean profanity must be translated to English profanity.\n"
+ "- Preserve original intent, and speech tone.\n"
+ "- Retain onomatopoeia in Romaji.\n"
+ "- Keep original Korean quotation marks (" ", ' ', 「」, 『』) as-is without converting to English quotes.\n"
+ "- Every Korean/Chinese/Japanese character must be converted to its English meaning. Examples: The character 생 means 'life/living', 활 means 'active', 관 means 'hall/building' - together 생활관 means Dormitory.\n"
+ "- Use line breaks for proper formatting as expected of a novel.\n"
+ ),
+ "japanese_TXT": (
+ "You are a professional Japanese to English novel translator, you must strictly output only English text while following these rules:\n"
+ "- Use a natural, comedy-friendly English translation style that captures both humor and readability without losing any original meaning.\n"
+ "- Include 100% of the source text - every word, phrase, and sentence must be fully translated without exception.\n"
+ "- Retain Japanese honorifics and respectful speech markers in romanized form, including but not limited to: -san, -sama, -chan, -kun, -dono, -sensei, -senpai, -kouhai. For archaic/classical Japanese honorific forms, preserve them as-is rather than converting to modern equivalents.\n"
+ "- Always localize Japanese terminology to proper English equivalents instead of literal translations (examples: 魔王 = Demon King; 魔術 = magic).\n"
+ "- When translating Japanese's pronoun-dropping style, insert pronouns in English only where needed for clarity: prioritize original pronouns as implied or according to the glossary, and only use they/them as a last resort, use I/me for first-person narration while reflecting the Japanese pronoun's nuance (私/僕/俺/etc.) through speech patterns rather than the pronoun itself, and maintain natural English flow without overusing pronouns just because they're omitted in Japanese.\n"
+ "- All Japanese profanity must be translated to English profanity.\n"
+ "- Preserve original intent, and speech tone.\n"
+ "- Retain onomatopoeia in Romaji.\n"
+ "- Keep original Japanese quotation marks (「」 and 『』) as-is without converting to English quotes.\n"
+ "- Every Korean/Chinese/Japanese character must be converted to its English meaning. Examples: The character 生 means 'life/living', 活 means 'active', 館 means 'hall/building' - together 生活館 means Dormitory.\n"
+ "- Use line breaks for proper formatting as expected of a novel.\n"
+ ),
+ "chinese_TXT": (
+ "You are a professional Chinese to English novel translator, you must strictly output only English text while following these rules:\n"
+ "- Use a natural, comedy-friendly English translation style that captures both humor and readability without losing any original meaning.\n"
+ "- Include 100% of the source text - every word, phrase, and sentence must be fully translated without exception.\n"
+ "- Retain Chinese titles and respectful forms of address in romanized form, including but not limited to: laoban, laoshi, shifu, xiaojie, xiansheng, taitai, daren, qianbei. For archaic/classical Chinese respectful forms, preserve them as-is rather than converting to modern equivalents.\n"
+ "- Always localize Chinese terminology to proper English equivalents instead of literal translations (examples: 魔王 = Demon King; 法术 = magic).\n"
+ "- When translating Chinese's flexible pronoun usage, insert pronouns in English only where needed for clarity: prioritize original pronouns as implied or according to the glossary, and only use they/them as a last resort, use I/me for first-person narration while reflecting the pronoun's nuance (我/吾/咱/人家/etc.) through speech patterns and formality level rather than the pronoun itself, and since Chinese pronouns don't indicate gender in speech (他/她/它 all sound like 'tā'), rely on context or glossary rather than assuming gender.\n"
+ "- All Chinese profanity must be translated to English profanity.\n"
+ "- Preserve original intent, and speech tone.\n"
+ "- Retain onomatopoeia in Romaji.\n"
+ "- Keep original Chinese quotation marks (「」 for dialogue, 《》 for titles) as-is without converting to English quotes.\n"
+ "- Every Korean/Chinese/Japanese character must be converted to its English meaning. Examples: The character 生 means 'life/living', 活 means 'active', 館 means 'hall/building' - together 生活館 means Dormitory.\n"
+ "- Use line breaks for proper formatting as expected of a novel.\n"
+ ),
+ "Manga_JP": (
+ "You are a professional Japanese to English Manga translator.\n"
+ "You have both the image of the Manga panel and the extracted text to work with.\n"
+ "Output only English text while following these rules: \n\n"
+
+ "VISUAL CONTEXT:\n"
+ "- Analyze the character’s facial expressions and body language in the image.\n"
+ "- Consider the scene’s mood and atmosphere.\n"
+ "- Note any action or movement depicted.\n"
+ "- Use visual cues to determine the appropriate tone and emotion.\n"
+ "- USE THE IMAGE to inform your translation choices. The image is not decorative - it contains essential context for accurate translation.\n\n"
+
+ "DIALOGUE REQUIREMENTS:\n"
+ "- Match the translation tone to the character’s expression.\n"
+ "- If a character looks angry, use appropriately intense language.\n"
+ "- If a character looks shy or embarrassed, reflect that in the translation.\n"
+ "- Keep speech patterns consistent with the character’s appearance and demeanor.\n"
+ "- Retain honorifics and onomatopoeia in Romaji.\n\n"
+
+ "IMPORTANT: Use both the visual context and text to create the most accurate and natural-sounding translation.\n"
+ ),
+ "Manga_KR": (
+ "You are a professional Korean to English Manhwa translator.\n"
+ "You have both the image of the Manhwa panel and the extracted text to work with.\n"
+ "Output only English text while following these rules: \n\n"
+
+ "VISUAL CONTEXT:\n"
+ "- Analyze the character’s facial expressions and body language in the image.\n"
+ "- Consider the scene’s mood and atmosphere.\n"
+ "- Note any action or movement depicted.\n"
+ "- Use visual cues to determine the appropriate tone and emotion.\n"
+ "- USE THE IMAGE to inform your translation choices. The image is not decorative - it contains essential context for accurate translation.\n\n"
+
+ "DIALOGUE REQUIREMENTS:\n"
+ "- Match the translation tone to the character’s expression.\n"
+ "- If a character looks angry, use appropriately intense language.\n"
+ "- If a character looks shy or embarrassed, reflect that in the translation.\n"
+ "- Keep speech patterns consistent with the character’s appearance and demeanor.\n"
+ "- Retain honorifics and onomatopoeia in Romaji.\n\n"
+
+ "IMPORTANT: Use both the visual context and text to create the most accurate and natural-sounding translation.\n"
+ ),
+ "Manga_CN": (
+ "You are a professional Chinese to English Manga translator.\n"
+ "You have both the image of the Manga panel and the extracted text to work with.\n"
+ "Output only English text while following these rules: \n\n"
+
+ "VISUAL CONTEXT:\n"
+ "- Analyze the character’s facial expressions and body language in the image.\n"
+ "- Consider the scene’s mood and atmosphere.\n"
+ "- Note any action or movement depicted.\n"
+ "- Use visual cues to determine the appropriate tone and emotion.\n"
+ "- USE THE IMAGE to inform your translation choices. The image is not decorative - it contains essential context for accurate translation.\n"
+
+ "DIALOGUE REQUIREMENTS:\n"
+ "- Match the translation tone to the character’s expression.\n"
+ "- If a character looks angry, use appropriately intense language.\n"
+ "- If a character looks shy or embarrassed, reflect that in the translation.\n"
+ "- Keep speech patterns consistent with the character’s appearance and demeanor.\n"
+ "- Retain honorifics and onomatopoeia in Romaji.\n\n"
+
+ "IMPORTANT: Use both the visual context and text to create the most accurate and natural-sounding translation.\n"
+ ),
+ "Glossary_Editor": (
+ "I have a messy character glossary from a Korean web novel that needs to be cleaned up and restructured. Please Output only JSON entries while creating a clean JSON glossary with the following requirements:\n"
+ "1. Merge duplicate character entries - Some characters appear multiple times (e.g., Noah, Ichinose family members).\n"
+ "2. Separate mixed character data - Some entries incorrectly combine multiple characters' information.\n"
+ "3. Use 'Korean = English' format - Replace all parentheses with equals signs (e.g., '이로한 = Lee Rohan' instead of '이로한 (Lee Rohan)').\n"
+ "4. Merge original_name fields - Combine original Korean names with English names in the name field.\n"
+ "5. Remove empty fields - Don't include empty arrays or objects.\n"
+ "6. Fix gender inconsistencies - Correct based on context from aliases.\n"
+
+ ),
+ "Original": "Return everything exactly as seen on the source."
+ }
+
+ self._init_default_prompts()
+ self._init_variables()
+ self._setup_gui()
+ self.metadata_batch_ui = MetadataBatchTranslatorUI(self)
+
+ try:
+ needs_encryption = False
+ if 'api_key' in self.config and self.config['api_key']:
+ if not self.config['api_key'].startswith('ENC:'):
+ needs_encryption = True
+ if 'replicate_api_key' in self.config and self.config['replicate_api_key']:
+ if not self.config['replicate_api_key'].startswith('ENC:'):
+ needs_encryption = True
+
+ if needs_encryption:
+ # Auto-migrate to encrypted format
+ print("Auto-encrypting API keys...")
+ self.save_config(show_message=False)
+ print("API keys encrypted successfully!")
+ except Exception as e:
+ print(f"Auto-encryption check failed: {e}")
+
+ def _check_updates_on_startup(self):
+ """Check for updates on startup with debug logging (async)"""
+ print("[DEBUG] Running startup update check...")
+ if self.update_manager:
+ try:
+ self.update_manager.check_for_updates_async(silent=True)
+ print(f"[DEBUG] Update check dispatched asynchronously")
+ except Exception as e:
+ print(f"[DEBUG] Update check failed to dispatch: {e}")
+ else:
+ print("[DEBUG] Update manager is None")
+
+ def check_for_updates_manual(self):
+ """Manually check for updates from the Other Settings dialog with loading animation"""
+ if hasattr(self, 'update_manager') and self.update_manager:
+ self._show_update_loading_and_check()
+ else:
+ messagebox.showerror("Update Check",
+ "Update manager is not available.\n"
+ "Please check the GitHub releases page manually:\n"
+ "https://github.com/Shirochi-stack/Glossarion/releases")
+
+ def _show_update_loading_and_check(self):
+ """Show animated loading dialog while checking for updates"""
+ import tkinter as tk
+ import tkinter.ttk as ttk
+ from PIL import Image, ImageTk
+ import threading
+ import os
+
+ # Create loading dialog
+ loading_dialog = tk.Toplevel(self.master)
+ loading_dialog.title("Checking for Updates")
+ loading_dialog.geometry("300x150")
+ loading_dialog.resizable(False, False)
+ loading_dialog.transient(self.master)
+ loading_dialog.grab_set()
+
+ # Set the proper application icon for the dialog
+ try:
+ # Use the same icon loading method as the main application
+ load_application_icon(loading_dialog, self.base_dir)
+ except Exception as e:
+ print(f"Could not load icon for loading dialog: {e}")
+
+ # Position dialog at mouse cursor
+ try:
+ mouse_x = self.master.winfo_pointerx()
+ mouse_y = self.master.winfo_pointery()
+ # Offset slightly so dialog doesn't cover cursor
+ loading_dialog.geometry("+%d+%d" % (mouse_x + 10, mouse_y + 10))
+ except:
+ # Fallback to center of main window if mouse position fails
+ loading_dialog.geometry("+%d+%d" % (
+ self.master.winfo_rootx() + 50,
+ self.master.winfo_rooty() + 50
+ ))
+
+ # Create main frame
+ main_frame = ttk.Frame(loading_dialog, padding="20")
+ main_frame.pack(fill="both", expand=True)
+
+ # Try to load and resize the icon (same approach as main GUI)
+ icon_label = None
+ try:
+ ico_path = os.path.join(self.base_dir, 'Halgakos.ico')
+ if os.path.isfile(ico_path):
+ # Load and resize image
+ original_image = Image.open(ico_path)
+ # Resize to 48x48 for loading animation
+ resized_image = original_image.resize((48, 48), Image.Resampling.LANCZOS)
+ self.loading_icon = ImageTk.PhotoImage(resized_image)
+
+ icon_label = ttk.Label(main_frame, image=self.loading_icon)
+ icon_label.pack(pady=(0, 10))
+ except Exception as e:
+ print(f"Could not load loading icon: {e}")
+
+ # Add loading text
+ loading_text = ttk.Label(main_frame, text="Checking for updates...",
+ font=('TkDefaultFont', 11))
+ loading_text.pack()
+
+ # Add progress bar
+ progress_bar = ttk.Progressbar(main_frame, mode='indeterminate')
+ progress_bar.pack(pady=(10, 10), fill='x')
+ progress_bar.start(10) # Start animation
+
+ # Animation state
+ self.loading_animation_active = True
+ self.loading_rotation = 0
+
+ def animate_icon():
+ """Animate the loading icon by rotating it"""
+ if not self.loading_animation_active or not icon_label:
+ return
+
+ try:
+ if hasattr(self, 'loading_icon'):
+ # Simple text-based animation instead of rotation
+ dots = "." * ((self.loading_rotation // 10) % 4)
+ loading_text.config(text=f"Checking for updates{dots}")
+ self.loading_rotation += 1
+
+ # Schedule next animation frame
+ loading_dialog.after(100, animate_icon)
+ except:
+ pass # Dialog might have been destroyed
+
+ # Start icon animation if we have an icon
+ if icon_label:
+ animate_icon()
+
+ def check_updates_thread():
+ """Run update check in background thread"""
+ try:
+ # Perform the actual update check
+ self.update_manager.check_for_updates(silent=False, force_show=True)
+ except Exception as e:
+ # Schedule error display on main thread
+ loading_dialog.after(0, lambda: self._show_update_error(str(e)))
+ finally:
+ # Schedule cleanup on main thread
+ loading_dialog.after(0, cleanup_loading)
+
+ def cleanup_loading():
+ """Clean up the loading dialog"""
+ try:
+ self.loading_animation_active = False
+ progress_bar.stop()
+ loading_dialog.grab_release()
+ loading_dialog.destroy()
+ except:
+ pass # Dialog might already be destroyed
+
+ def _show_update_error(error_msg):
+ """Show update check error"""
+ cleanup_loading()
+ messagebox.showerror("Update Check Failed",
+ f"Failed to check for updates:\n{error_msg}")
+
+ # Start the update check in a separate thread
+ update_thread = threading.Thread(target=check_updates_thread, daemon=True)
+ update_thread.start()
+
+ # Handle dialog close
+ def on_dialog_close():
+ self.loading_animation_active = False
+ cleanup_loading()
+
+ loading_dialog.protocol("WM_DELETE_WINDOW", on_dialog_close)
+
+ def append_log_with_api_error_detection(self, message):
+ """Enhanced log appending that detects and highlights API errors"""
+ # First append the regular log message
+ self.append_log(message)
+
+ # Check for API error patterns
+ message_lower = message.lower()
+
+ if "429" in message or "rate limit" in message_lower:
+ # Rate limit error detected
+ self.append_log("⚠️ RATE LIMIT ERROR DETECTED (HTTP 429)")
+ self.append_log(" The API is throttling your requests.")
+ self.append_log(" Please wait before continuing or increase the delay between requests.")
+ self.append_log(" You can increase 'Delay between API calls' in settings.")
+
+ elif "401" in message or "unauthorized" in message_lower:
+ # Authentication error
+ self.append_log("❌ AUTHENTICATION ERROR (HTTP 401)")
+ self.append_log(" Your API key is invalid or missing.")
+ self.append_log(" Please check your API key in the settings.")
+
+ elif "403" in message or "forbidden" in message_lower:
+ # Forbidden error
+ self.append_log("❌ ACCESS FORBIDDEN ERROR (HTTP 403)")
+ self.append_log(" You don't have permission to access this API.")
+ self.append_log(" Please check your API subscription and permissions.")
+
+ elif "400" in message or "bad request" in message_lower:
+ # Bad request error
+ self.append_log("❌ BAD REQUEST ERROR (HTTP 400)")
+ self.append_log(" The API request was malformed or invalid.")
+ self.append_log(" This might be due to unsupported model settings.")
+
+ elif "timeout" in message_lower:
+ # Timeout error
+ self.append_log("⏱️ TIMEOUT ERROR")
+ self.append_log(" The API request took too long to respond.")
+ self.append_log(" Consider increasing timeout settings or retrying.")
+
+
+ def create_glossary_backup(self, operation_name="manual"):
+ """Create a backup of the current glossary if auto-backup is enabled"""
+ # For manual backups, always proceed. For automatic backups, check the setting.
+ if operation_name != "manual" and not self.config.get('glossary_auto_backup', True):
+ return True
+
+ if not self.current_glossary_data or not self.editor_file_var.get():
+ return True
+
+ try:
+ # Get the original glossary file path
+ original_path = self.editor_file_var.get()
+ original_dir = os.path.dirname(original_path)
+ original_name = os.path.basename(original_path)
+
+ # Create backup directory
+ backup_dir = os.path.join(original_dir, "Backups")
+
+ # Create directory if it doesn't exist
+ try:
+ os.makedirs(backup_dir, exist_ok=True)
+ except Exception as e:
+ self.append_log(f"⚠️ Failed to create backup directory: {str(e)}")
+ return False
+
+ # Generate timestamp-based backup filename
+ timestamp = time.strftime("%Y%m%d_%H%M%S")
+ backup_name = f"{os.path.splitext(original_name)[0]}_{operation_name}_{timestamp}.json"
+ backup_path = os.path.join(backup_dir, backup_name)
+
+ # Try to save backup
+ with open(backup_path, 'w', encoding='utf-8') as f:
+ json.dump(self.current_glossary_data, f, ensure_ascii=False, indent=2)
+
+ self.append_log(f"💾 Backup created: {backup_name}")
+
+ # Optional: Clean old backups if more than limit
+ max_backups = self.config.get('glossary_max_backups', 50)
+ if max_backups > 0:
+ self._clean_old_backups(backup_dir, original_name, max_backups)
+
+ return True
+
+ except Exception as e:
+ # Log the actual error
+ self.append_log(f"⚠️ Backup failed: {str(e)}")
+ # Ask user if they want to continue anyway
+ return messagebox.askyesno("Backup Failed",
+ f"Failed to create backup: {str(e)}\n\nContinue anyway?")
+
+ def get_current_epub_path(self):
+ """Get the currently selected EPUB path from various sources"""
+ epub_path = None
+
+ # Try different sources in order of preference
+ sources = [
+ # Direct selection
+ lambda: getattr(self, 'selected_epub_path', None),
+ # From config
+ lambda: self.config.get('last_epub_path', None) if hasattr(self, 'config') else None,
+ # From file path variable (if it exists)
+ lambda: self.epub_file_path.get() if hasattr(self, 'epub_file_path') and self.epub_file_path.get() else None,
+ # From current translation
+ lambda: getattr(self, 'current_epub_path', None),
+ ]
+
+ for source in sources:
+ try:
+ path = source()
+ if path and os.path.exists(path):
+ epub_path = path
+ print(f"[DEBUG] Found EPUB path from source: {path}") # Debug line
+ break
+ except Exception as e:
+ print(f"[DEBUG] Error checking source: {e}") # Debug line
+ continue
+
+ if not epub_path:
+ print("[DEBUG] No EPUB path found from any source") # Debug line
+
+ return epub_path
+
+ def _clean_old_backups(self, backup_dir, original_name, max_backups):
+ """Remove old backups exceeding the limit"""
+ try:
+ # Find all backups for this glossary
+ prefix = os.path.splitext(original_name)[0]
+ backups = []
+
+ for file in os.listdir(backup_dir):
+ if file.startswith(prefix) and file.endswith('.json'):
+ file_path = os.path.join(backup_dir, file)
+ backups.append((file_path, os.path.getmtime(file_path)))
+
+ # Sort by modification time (oldest first)
+ backups.sort(key=lambda x: x[1])
+
+ # Remove oldest backups if exceeding limit
+ while len(backups) > max_backups:
+ old_backup = backups.pop(0)
+ os.remove(old_backup[0])
+ self.append_log(f"🗑️ Removed old backup: {os.path.basename(old_backup[0])}")
+
+ except Exception as e:
+ self.append_log(f"⚠️ Error cleaning old backups: {str(e)}")
+
+ def open_manga_translator(self):
+ """Open manga translator in a new window"""
+ if not MANGA_SUPPORT:
+ messagebox.showwarning("Not Available", "Manga translation modules not found.")
+ return
+
+ # Always open directly - model preloading will be handled inside the manga tab
+ self._open_manga_translator_direct()
+
+ def _open_manga_translator_direct(self):
+ """Open manga translator directly without loading screen"""
+ # Import PySide6 components for the manga translator
+ try:
+ from PySide6.QtWidgets import QApplication, QDialog, QWidget, QVBoxLayout, QScrollArea
+ from PySide6.QtCore import Qt
+ except ImportError:
+ messagebox.showerror("Missing Dependency",
+ "PySide6 is required for manga translation. Please install it:\npip install PySide6")
+ return
+
+ # Create or get QApplication instance
+ app = QApplication.instance()
+ if not app:
+ # Set DPI awareness before creating QApplication
+ try:
+ QApplication.setHighDpiScaleFactorRoundingPolicy(Qt.HighDpiScaleFactorRoundingPolicy.PassThrough)
+ except:
+ pass
+ app = QApplication(sys.argv)
+
+ # Create PySide6 dialog
+ dialog = QDialog()
+ dialog.setWindowTitle("🎌 Manga Panel Translator")
+
+ # Set icon if available
+ try:
+ icon_path = os.path.join(self.base_dir, 'Halgakos.ico')
+ if os.path.exists(icon_path):
+ from PySide6.QtGui import QIcon
+ dialog.setWindowIcon(QIcon(icon_path))
+ except Exception:
+ pass
+
+ # Size and position the dialog
+ screen = app.primaryScreen().geometry()
+ dialog_width = min(900, int(screen.width() * 0.9))
+ dialog_height = min(900, int(screen.height() * 0.95)) # Increased from 700 to 900
+ dialog.resize(dialog_width, dialog_height)
+
+ # Center the dialog
+ dialog_x = (screen.width() - dialog_width) // 2
+ dialog_y = max(20, (screen.height() - dialog_height) // 2)
+ dialog.move(dialog_x, dialog_y)
+
+ # Create scrollable content area
+ scroll_area = QScrollArea()
+ scroll_area.setWidgetResizable(True)
+ scroll_area.setHorizontalScrollBarPolicy(Qt.ScrollBarAsNeeded)
+ scroll_area.setVerticalScrollBarPolicy(Qt.ScrollBarAsNeeded)
+
+ # Create main content widget
+ content_widget = QWidget()
+ scroll_area.setWidget(content_widget)
+
+ # Set dialog layout
+ dialog_layout = QVBoxLayout(dialog)
+ dialog_layout.setContentsMargins(0, 0, 0, 0)
+ dialog_layout.addWidget(scroll_area)
+
+ # Initialize the manga translator interface with PySide6 widget
+ self.manga_translator = MangaTranslationTab(content_widget, self, dialog, scroll_area)
+
+ # Handle window close
+ def on_close():
+ try:
+ if self.manga_translator:
+ # Stop any running translations
+ if hasattr(self.manga_translator, 'stop_translation'):
+ self.manga_translator.stop_translation()
+ self.manga_translator = None
+ dialog.close()
+ except Exception as e:
+ print(f"Error closing manga translator: {e}")
+
+ dialog.finished.connect(on_close)
+
+ # Show the dialog
+ dialog.show()
+
+ # Keep reference to prevent garbage collection
+ self._manga_dialog = dialog
+
+
+ def _init_default_prompts(self):
+ """Initialize all default prompt templates"""
+ self.default_manual_glossary_prompt = """Extract character names and important terms from the following text.
+
+Output format:
+{fields}
+
+Rules:
+- Output ONLY CSV lines in the exact format shown above
+- No headers, no extra text, no JSON
+- One entry per line
+- Leave gender empty for terms (just end with comma)
+"""
+
+ self.default_auto_glossary_prompt = """You are extracting a targeted glossary from a {language} novel.
+Focus on identifying:
+1. Character names with their honorifics
+2. Important titles and ranks
+3. Frequently mentioned terms (min frequency: {min_frequency})
+
+Extract up to {max_names} character names and {max_titles} titles.
+Prioritize names that appear with honorifics or in important contexts.
+Return the glossary in a simple key-value format.
+ """
+
+ self.default_rolling_summary_system_prompt = """You are a context summarization assistant. Create concise, informative summaries that preserve key story elements for translation continuity."""
+
+ self.default_rolling_summary_user_prompt = """Analyze the recent translation exchanges and create a structured summary for context continuity.
+
+Focus on extracting and preserving:
+1. **Character Information**: Names (with original forms), relationships, roles, and important character developments
+2. **Plot Points**: Key events, conflicts, and story progression
+3. **Locations**: Important places and settings
+4. **Terminology**: Special terms, abilities, items, or concepts (with original forms)
+5. **Tone & Style**: Writing style, mood, and any notable patterns
+6. **Unresolved Elements**: Questions, mysteries, or ongoing situations
+
+Format the summary clearly with sections. Be concise but comprehensive.
+
+Recent translations to summarize:
+{translations}
+ """
+
+ def _init_variables(self):
+ """Initialize all configuration variables"""
+ # Load saved prompts
+ self.manual_glossary_prompt = self.config.get('manual_glossary_prompt', self.default_manual_glossary_prompt)
+ self.auto_glossary_prompt = self.config.get('auto_glossary_prompt', self.default_auto_glossary_prompt)
+ self.rolling_summary_system_prompt = self.config.get('rolling_summary_system_prompt', self.default_rolling_summary_system_prompt)
+ self.rolling_summary_user_prompt = self.config.get('rolling_summary_user_prompt', self.default_rolling_summary_user_prompt)
+ self.append_glossary_prompt = self.config.get('append_glossary_prompt', "- Follow this reference glossary for consistent translation (Do not output any raw entries):\n")
+ self.translation_chunk_prompt = self.config.get('translation_chunk_prompt', self.default_translation_chunk_prompt)
+ self.image_chunk_prompt = self.config.get('image_chunk_prompt', self.default_image_chunk_prompt)
+
+ self.custom_glossary_fields = self.config.get('custom_glossary_fields', [])
+ self.token_limit_disabled = self.config.get('token_limit_disabled', False)
+ self.api_key_visible = False # Default to hidden
+
+ if 'glossary_duplicate_key_mode' not in self.config:
+ self.config['glossary_duplicate_key_mode'] = 'fuzzy'
+ # Initialize fuzzy threshold variable
+ if not hasattr(self, 'fuzzy_threshold_var'):
+ self.fuzzy_threshold_var = tk.DoubleVar(value=self.config.get('glossary_fuzzy_threshold', 0.90))
+
+ # Create all config variables with helper
+ def create_var(var_type, key, default):
+ return var_type(value=self.config.get(key, default))
+
+ # Boolean variables
+ bool_vars = [
+ ('rolling_summary_var', 'use_rolling_summary', False),
+ ('translation_history_rolling_var', 'translation_history_rolling', False),
+ ('glossary_history_rolling_var', 'glossary_history_rolling', False),
+ ('translate_book_title_var', 'translate_book_title', True),
+ ('enable_auto_glossary_var', 'enable_auto_glossary', False),
+ ('append_glossary_var', 'append_glossary', False),
+ ('retry_truncated_var', 'retry_truncated', False),
+ ('retry_duplicate_var', 'retry_duplicate_bodies', False),
+ # NEW: QA scanning helpers
+ ('qa_auto_search_output_var', 'qa_auto_search_output', True),
+ ('scan_phase_enabled_var', 'scan_phase_enabled', False),
+ ('indefinite_rate_limit_retry_var', 'indefinite_rate_limit_retry', True),
+ # Keep existing variables intact
+ ('enable_image_translation_var', 'enable_image_translation', False),
+ ('process_webnovel_images_var', 'process_webnovel_images', True),
+ # REMOVED: ('comprehensive_extraction_var', 'comprehensive_extraction', False),
+ ('hide_image_translation_label_var', 'hide_image_translation_label', True),
+ ('retry_timeout_var', 'retry_timeout', True),
+ ('batch_translation_var', 'batch_translation', False),
+ ('conservative_batching_var', 'conservative_batching', True),
+ ('disable_epub_gallery_var', 'disable_epub_gallery', False),
+ # NEW: Disable automatic cover creation (affects extraction and EPUB cover page)
+ ('disable_automatic_cover_creation_var', 'disable_automatic_cover_creation', False),
+ # NEW: Translate cover.html (Skip Override)
+ ('translate_cover_html_var', 'translate_cover_html', False),
+ ('disable_zero_detection_var', 'disable_zero_detection', True),
+ ('use_header_as_output_var', 'use_header_as_output', False),
+ ('emergency_restore_var', 'emergency_paragraph_restore', False),
+ ('contextual_var', 'contextual', False),
+ ('REMOVE_AI_ARTIFACTS_var', 'REMOVE_AI_ARTIFACTS', False),
+ ('enable_watermark_removal_var', 'enable_watermark_removal', True),
+ ('save_cleaned_images_var', 'save_cleaned_images', False),
+ ('advanced_watermark_removal_var', 'advanced_watermark_removal', False),
+ ('enable_decimal_chapters_var', 'enable_decimal_chapters', False),
+ ('disable_gemini_safety_var', 'disable_gemini_safety', False),
+ ('single_api_image_chunks_var', 'single_api_image_chunks', False),
+
+ ]
+
+ for var_name, key, default in bool_vars:
+ setattr(self, var_name, create_var(tk.BooleanVar, key, default))
+
+ # String variables
+ str_vars = [
+ ('summary_role_var', 'summary_role', 'user'),
+ ('rolling_summary_exchanges_var', 'rolling_summary_exchanges', '5'),
+ ('rolling_summary_mode_var', 'rolling_summary_mode', 'append'),
+ # New: how many summaries to retain in append mode
+ ('rolling_summary_max_entries_var', 'rolling_summary_max_entries', '5'),
+ ('reinforcement_freq_var', 'reinforcement_frequency', '10'),
+ ('max_retry_tokens_var', 'max_retry_tokens', '16384'),
+ ('duplicate_lookback_var', 'duplicate_lookback_chapters', '5'),
+ ('glossary_min_frequency_var', 'glossary_min_frequency', '2'),
+ ('glossary_max_names_var', 'glossary_max_names', '50'),
+ ('glossary_max_titles_var', 'glossary_max_titles', '30'),
+ ('glossary_batch_size_var', 'glossary_batch_size', '50'),
+ ('webnovel_min_height_var', 'webnovel_min_height', '1000'),
+ ('max_images_per_chapter_var', 'max_images_per_chapter', '1'),
+ ('image_chunk_height_var', 'image_chunk_height', '1500'),
+ ('chunk_timeout_var', 'chunk_timeout', '900'),
+ ('batch_size_var', 'batch_size', '3'),
+ ('chapter_number_offset_var', 'chapter_number_offset', '0'),
+ ('compression_factor_var', 'compression_factor', '1.0'),
+ # NEW: scanning phase mode (quick-scan/aggressive/ai-hunter/custom)
+ ('scan_phase_mode_var', 'scan_phase_mode', 'quick-scan')
+ ]
+
+ for var_name, key, default in str_vars:
+ setattr(self, var_name, create_var(tk.StringVar, key, str(default)))
+
+ # NEW: Initialize extraction mode variable
+ self.extraction_mode_var = tk.StringVar(
+ value=self.config.get('extraction_mode', 'smart')
+ )
+
+ self.book_title_prompt = self.config.get('book_title_prompt',
+ "Translate this book title to English while retaining any acronyms:")
+ # Initialize book title system prompt
+ if 'book_title_system_prompt' not in self.config:
+ self.config['book_title_system_prompt'] = "You are a translator. Respond with only the translated text, nothing else. Do not add any explanation or additional content."
+
+ # Profiles
+ self.prompt_profiles = self.config.get('prompt_profiles', self.default_prompts.copy())
+ active = self.config.get('active_profile', next(iter(self.prompt_profiles)))
+ self.profile_var = tk.StringVar(value=active)
+ self.lang_var = self.profile_var
+
+ # Detection mode
+ self.duplicate_detection_mode_var = tk.StringVar(value=self.config.get('duplicate_detection_mode', 'basic'))
+
+ def _setup_gui(self):
+ """Initialize all GUI components"""
+ self.frame = tb.Frame(self.master, padding=10)
+ self.frame.pack(fill=tk.BOTH, expand=True)
+
+ # Configure grid
+ for i in range(5):
+ self.frame.grid_columnconfigure(i, weight=1 if i in [1, 3] else 0)
+ for r in range(12):
+ self.frame.grid_rowconfigure(r, weight=1 if r in [9, 10] else 0, minsize=200 if r == 9 else 150 if r == 10 else 0)
+
+ # Create UI elements using helper methods
+ self.create_file_section()
+ self._create_model_section()
+ self._create_profile_section()
+ self._create_settings_section()
+ self._create_api_section()
+ self._create_prompt_section()
+ self._create_log_section()
+ self._make_bottom_toolbar()
+
+ # Apply token limit state
+ if self.token_limit_disabled:
+ self.token_limit_entry.config(state=tk.DISABLED)
+ self.toggle_token_btn.config(text="Enable Input Token Limit", bootstyle="success-outline")
+
+ self.on_profile_select()
+ self.append_log("🚀 Glossarion v4.8.5 - Ready to use!")
+ self.append_log("💡 Click any function button to load modules automatically")
+
+ # Restore last selected input files if available
+ try:
+ last_files = self.config.get('last_input_files', []) if hasattr(self, 'config') else []
+ if isinstance(last_files, list) and last_files:
+ existing = [p for p in last_files if isinstance(p, str) and os.path.exists(p)]
+ if existing:
+ # Populate the entry and internal state using shared handler
+ self._handle_file_selection(existing)
+ self.append_log(f"📁 Restored last selection: {len(existing)} file(s)")
+ except Exception:
+ pass
+
+ def create_file_section(self):
+ """Create file selection section with multi-file support"""
+ # Initialize file selection variables
+ self.selected_files = []
+ self.current_file_index = 0
+
+ # File label
+ tb.Label(self.frame, text="Input File(s):").grid(row=0, column=0, sticky=tk.W, padx=5, pady=5)
+
+ # File entry
+ self.entry_epub = tb.Entry(self.frame, width=50)
+ self.entry_epub.grid(row=0, column=1, columnspan=3, sticky=tk.EW, padx=5, pady=5)
+ self.entry_epub.insert(0, "No file selected")
+
+ # Create browse menu
+ self.browse_menu = tk.Menu(self.master, tearoff=0, font=('Arial', 12))
+ self.browse_menu.add_command(label="📄 Select Files", command=self.browse_files)
+ self.browse_menu.add_command(label="📁 Select Folder", command=self.browse_folder)
+ self.browse_menu.add_separator()
+ self.browse_menu.add_command(label="🗑️ Clear Selection", command=self.clear_file_selection)
+
+ # Create browse menu button
+ self.btn_browse_menu = tb.Menubutton(
+ self.frame,
+ text="Browse ▼",
+ menu=self.browse_menu,
+ width=12,
+ bootstyle="primary"
+ )
+ self.btn_browse_menu.grid(row=0, column=4, sticky=tk.EW, padx=5, pady=5)
+
+ # File selection status label (shows file count and details)
+ self.file_status_label = tb.Label(
+ self.frame,
+ text="",
+ font=('Arial', 9),
+ bootstyle="info"
+ )
+ self.file_status_label.grid(row=1, column=1, columnspan=3, sticky=tk.W, padx=5, pady=(0, 5))
+
+ # Google Cloud Credentials button
+ self.gcloud_button = tb.Button(
+ self.frame,
+ text="GCloud Creds",
+ command=self.select_google_credentials,
+ width=12,
+ state=tk.DISABLED,
+ bootstyle="secondary"
+ )
+ self.gcloud_button.grid(row=2, column=4, sticky=tk.EW, padx=5, pady=5)
+
+ # Vertex AI Location text entry
+ self.vertex_location_var = tk.StringVar(value=self.config.get('vertex_ai_location', 'us-east5'))
+ self.vertex_location_entry = tb.Entry(
+ self.frame,
+ textvariable=self.vertex_location_var,
+ width=12
+ )
+ self.vertex_location_entry.grid(row=3, column=4, sticky=tk.EW, padx=5, pady=5)
+
+ # Hide by default
+ self.vertex_location_entry.grid_remove()
+
+ # Status label for credentials
+ self.gcloud_status_label = tb.Label(
+ self.frame,
+ text="",
+ font=('Arial', 9),
+ bootstyle="secondary"
+ )
+ self.gcloud_status_label.grid(row=2, column=1, columnspan=3, sticky=tk.W, padx=5, pady=(0, 5))
+
+ # Optional: Add checkbox for enhanced functionality
+ options_frame = tb.Frame(self.frame)
+ options_frame.grid(row=1, column=4, columnspan=1, sticky=tk.EW, padx=5, pady=5)
+
+ # Deep scan option for folders
+ self.deep_scan_var = tk.BooleanVar(value=False)
+ self.deep_scan_check = tb.Checkbutton(
+ options_frame,
+ text="include subfolders",
+ variable=self.deep_scan_var,
+ bootstyle="round-toggle"
+ )
+ self.deep_scan_check.pack(side='left')
+
+ def select_google_credentials(self):
+ """Select Google Cloud credentials JSON file"""
+ filename = filedialog.askopenfilename(
+ title="Select Google Cloud Credentials JSON",
+ filetypes=[("JSON files", "*.json"), ("All files", "*.*")]
+ )
+
+ if filename:
+ try:
+ # Validate it's a valid Google Cloud credentials file
+ with open(filename, 'r') as f:
+ creds_data = json.load(f)
+ if 'type' in creds_data and 'project_id' in creds_data:
+ # Save to config
+ self.config['google_cloud_credentials'] = filename
+ self.save_config()
+
+ # Update UI
+ self.gcloud_status_label.config(
+ text=f"✓ Credentials: {os.path.basename(filename)} (Project: {creds_data.get('project_id', 'Unknown')})",
+ foreground='green'
+ )
+
+ # Set environment variable for child processes
+ os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = filename
+
+ self.append_log(f"Google Cloud credentials loaded: {os.path.basename(filename)}")
+ else:
+ messagebox.showerror(
+ "Error",
+ "Invalid Google Cloud credentials file. Please select a valid service account JSON file."
+ )
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to load credentials: {str(e)}")
+
+ def on_model_change(self, event=None):
+ """Handle model selection change from dropdown or manual input"""
+ # Get the current model value (from dropdown or manually typed)
+ model = self.model_var.get()
+
+ # Show Google Cloud Credentials button for Vertex AI models AND Google Translate
+ needs_google_creds = False
+
+ if '@' in model or model.startswith('vertex/') or model.startswith('vertex_ai/'):
+ needs_google_creds = True
+ self.vertex_location_entry.grid() # Show location selector for Vertex
+ elif model == 'google-translate':
+ needs_google_creds = True
+ self.vertex_location_entry.grid_remove() # Hide location selector for Google Translate
+
+ if needs_google_creds:
+ self.gcloud_button.config(state=tk.NORMAL)
+
+ # Check if credentials are already loaded
+ if self.config.get('google_cloud_credentials'):
+ creds_path = self.config['google_cloud_credentials']
+ if os.path.exists(creds_path):
+ try:
+ with open(creds_path, 'r') as f:
+ creds_data = json.load(f)
+ project_id = creds_data.get('project_id', 'Unknown')
+
+ # Different status messages for different services
+ if model == 'google-translate':
+ status_text = f"✓ Google Translate ready (Project: {project_id})"
+ else:
+ status_text = f"✓ Credentials: {os.path.basename(creds_path)} (Project: {project_id})"
+
+ self.gcloud_status_label.config(
+ text=status_text,
+ foreground='green'
+ )
+ except:
+ self.gcloud_status_label.config(
+ text="⚠ Error reading credentials",
+ foreground='red'
+ )
+ else:
+ self.gcloud_status_label.config(
+ text="⚠ Credentials file not found",
+ foreground='red'
+ )
+ else:
+ # Different prompts for different services
+ if model == 'google-translate':
+ warning_text = "⚠ Google Cloud credentials needed for Translate API"
+ else:
+ warning_text = "⚠ No Google Cloud credentials selected"
+
+ self.gcloud_status_label.config(
+ text=warning_text,
+ foreground='orange'
+ )
+ else:
+ # Not a Google service, hide everything
+ self.gcloud_button.config(state=tk.DISABLED)
+ self.vertex_location_entry.grid_remove()
+ self.gcloud_status_label.config(text="")
+
+ # Also add this to bind manual typing events to the combobox
+ def setup_model_combobox_bindings(self):
+ """Setup bindings for manual model input in combobox with autocomplete"""
+ # Bind to key release events for live filtering and autofill
+ self.model_combo.bind('', self._on_model_combo_keyrelease)
+ # Commit best match on Enter
+ self.model_combo.bind('', self._commit_model_autocomplete)
+ # Also bind to FocusOut to catch when user clicks away after typing
+ self.model_combo.bind('', self.on_model_change)
+ # Keep the existing binding for dropdown selection
+ self.model_combo.bind('<>', self.on_model_change)
+
+ def _on_model_combo_keyrelease(self, event=None):
+ """Combobox type-to-search without filtering values.
+ - Keeps the full model list intact; does not replace the Combobox values.
+ - Finds the best match and, if the dropdown is open, scrolls/highlights to it.
+ - Does NOT auto-fill text on deletion or mid-string edits (and by default avoids autofill).
+ - Calls on_model_change only when the entry text actually changes.
+ """
+ try:
+ combo = self.model_combo
+ typed = combo.get()
+ prev = getattr(self, '_model_prev_text', '')
+ keysym = (getattr(event, 'keysym', '') or '').lower()
+
+ # Navigation/commit keys: don't interfere; Combobox handles selection events
+ if keysym in {'up', 'down', 'left', 'right', 'return', 'escape', 'tab'}:
+ return
+
+ # Ensure we have the full source list
+ if not hasattr(self, '_model_all_values') or not self._model_all_values:
+ try:
+ self._model_all_values = list(combo['values'])
+ except Exception:
+ self._model_all_values = []
+
+ source = self._model_all_values
+
+ # Compute match set without altering combobox values
+ first_match = None
+ if typed:
+ lowered = typed.lower()
+ pref = [v for v in source if v.lower().startswith(lowered)]
+ cont = [v for v in source if lowered in v.lower()] if not pref else []
+ if pref:
+ first_match = pref[0]
+ elif cont:
+ first_match = cont[0]
+
+ # Decide whether to perform any autofill: default to no text autofill
+ grew = len(typed) > len(prev) and typed.startswith(prev)
+ is_deletion = keysym in {'backspace', 'delete'} or len(typed) < len(prev)
+ try:
+ at_end = combo.index(tk.INSERT) == len(typed)
+ except Exception:
+ at_end = True
+ try:
+ has_selection = combo.selection_present()
+ except Exception:
+ has_selection = False
+
+ # Gentle autofill only when appending at the end (not on delete or mid-string edits)
+ do_autofill_text = first_match is not None and grew and at_end and not has_selection and not is_deletion
+
+ if do_autofill_text:
+ # Only complete if it's a true prefix match to avoid surprising jumps
+ if first_match.lower().startswith(typed.lower()) and first_match != typed:
+ combo.set(first_match)
+ try:
+ combo.icursor(len(typed))
+ combo.selection_range(len(typed), len(first_match))
+ except Exception:
+ pass
+
+ # If we have a match and the dropdown is open, scroll/highlight it (values intact)
+ if first_match:
+ self._scroll_model_list_to_value(first_match)
+
+ # Remember current text for next event
+ self._model_prev_text = typed
+
+ # Only trigger change logic when the text actually changed
+ if typed != prev:
+ self.on_model_change()
+ except Exception as e:
+ try:
+ logging.debug(f"Model combobox autocomplete error: {e}")
+ except Exception:
+ pass
+
+ def _commit_model_autocomplete(self, event=None):
+ """On Enter, commit to the best matching model (prefix preferred, then contains)."""
+ try:
+ combo = self.model_combo
+ typed = combo.get()
+ source = getattr(self, '_model_all_values', []) or list(combo['values'])
+ match = None
+ if typed:
+ lowered = typed.lower()
+ pref = [v for v in source if v.lower().startswith(lowered)]
+ cont = [v for v in source if lowered in v.lower()] if not pref else []
+ match = pref[0] if pref else (cont[0] if cont else None)
+ if match and match != typed:
+ combo.set(match)
+ # Move cursor to end and clear any selection
+ try:
+ combo.icursor('end')
+ try:
+ combo.selection_clear()
+ except Exception:
+ combo.selection_range(0, 0)
+ except Exception:
+ pass
+ # Update prev text and trigger change
+ self._model_prev_text = combo.get()
+ self.on_model_change()
+ except Exception as e:
+ try:
+ logging.debug(f"Model combobox enter-commit error: {e}")
+ except Exception:
+ pass
+ return "break"
+
+ def _ensure_model_dropdown_open(self):
+ """Open the combobox dropdown if it isn't already visible."""
+ try:
+ tkobj = self.model_combo.tk
+ popdown = tkobj.eval(f'ttk::combobox::PopdownWindow {self.model_combo._w}')
+ viewable = int(tkobj.eval(f'winfo viewable {popdown}'))
+ if not viewable:
+ # Prefer internal Post proc
+ tkobj.eval(f'ttk::combobox::Post {self.model_combo._w}')
+ except Exception:
+ # Fallback: try keyboard event to open
+ try:
+ self.model_combo.event_generate('')
+ except Exception:
+ pass
+
+ def _scroll_model_list_to_value(self, value: str):
+ """If the combobox dropdown is open, scroll to and highlight the given value.
+ Uses Tk internals for ttk::combobox to access the popdown listbox.
+ Safe no-op if anything fails.
+ """
+ try:
+ values = getattr(self, '_model_all_values', []) or list(self.model_combo['values'])
+ if value not in values:
+ return
+ index = values.index(value)
+ # Resolve the internal popdown listbox for this combobox
+ popdown = self.model_combo.tk.eval(f'ttk::combobox::PopdownWindow {self.model_combo._w}')
+ listbox = f'{popdown}.f.l'
+ tkobj = self.model_combo.tk
+ # Scroll and highlight the item
+ tkobj.call(listbox, 'see', index)
+ tkobj.call(listbox, 'selection', 'clear', 0, 'end')
+ tkobj.call(listbox, 'selection', 'set', index)
+ tkobj.call(listbox, 'activate', index)
+ except Exception:
+ # Dropdown may be closed or internals unavailable; ignore
+ pass
+ def _create_model_section(self):
+ """Create model selection section"""
+ tb.Label(self.frame, text="Model:").grid(row=1, column=0, sticky=tk.W, padx=5, pady=5)
+ default_model = self.config.get('model', 'gemini-2.0-flash')
+ self.model_var = tk.StringVar(value=default_model)
+ models = get_model_options()
+ self._model_all_values = models
+ self.model_combo = tb.Combobox(self.frame, textvariable=self.model_var, values=models, state="normal")
+ self.model_combo.grid(row=1, column=1, columnspan=2, sticky=tk.EW, padx=5, pady=5)
+
+ # Track previous text to make autocomplete less aggressive
+ self._model_prev_text = self.model_var.get()
+
+ self.model_combo.bind('<>', self.on_model_change)
+ self.setup_model_combobox_bindings()
+ self.model_var.trace('w', self._check_poe_model)
+ self.on_model_change()
+
+ def _create_profile_section(self):
+ """Create profile/profile section"""
+ tb.Label(self.frame, text="Profile:").grid(row=2, column=0, sticky=tk.W, padx=5, pady=5)
+ self.profile_menu = tb.Combobox(self.frame, textvariable=self.profile_var,
+ values=list(self.prompt_profiles.keys()), state="normal")
+ self.profile_menu.grid(row=2, column=1, sticky=tk.EW, padx=5, pady=5)
+ self.profile_menu.bind("<>", self.on_profile_select)
+ self.profile_menu.bind("", self.on_profile_select)
+ tb.Button(self.frame, text="Save Profile", command=self.save_profile, width=14).grid(row=2, column=2, sticky=tk.W, padx=5, pady=5)
+ tb.Button(self.frame, text="Delete Profile", command=self.delete_profile, width=14).grid(row=2, column=3, sticky=tk.W, padx=5, pady=5)
+
+ def _create_settings_section(self):
+ """Create all settings controls"""
+ # Threading delay (with extra spacing at top)
+ tb.Label(self.frame, text="Threading delay (s):").grid(row=3, column=0, sticky=tk.W, padx=5, pady=(15, 5)) # (top, bottom)
+ self.thread_delay_entry = tb.Entry(self.frame, textvariable=self.thread_delay_var, width=8)
+ self.thread_delay_entry.grid(row=3, column=1, sticky=tk.W, padx=5, pady=(15, 5)) # Match the label padding
+
+ # API delay (left side)
+ tb.Label(self.frame, text="API call delay (s):").grid(row=4, column=0, sticky=tk.W, padx=5, pady=5)
+ self.delay_entry = tb.Entry(self.frame, width=8)
+ self.delay_entry.insert(0, str(self.config.get('delay', 2)))
+ self.delay_entry.grid(row=4, column=1, sticky=tk.W, padx=5, pady=5)
+
+ # Optional help text (spanning both columns)
+ tb.Label(self.frame, text="(0 = simultaneous)",
+ font=('TkDefaultFont', 8), foreground='gray').grid(row=3, column=2, sticky=tk.W, padx=5, pady=(15, 5))
+
+ # Chapter Range
+ tb.Label(self.frame, text="Chapter range (e.g., 5-10):").grid(row=5, column=0, sticky=tk.W, padx=5, pady=5)
+ self.chapter_range_entry = tb.Entry(self.frame, width=12)
+ self.chapter_range_entry.insert(0, self.config.get('chapter_range', ''))
+ self.chapter_range_entry.grid(row=5, column=1, sticky=tk.W, padx=5, pady=5)
+
+ # Token limit
+ tb.Label(self.frame, text="Input Token limit:").grid(row=6, column=0, sticky=tk.W, padx=5, pady=5)
+ self.token_limit_entry = tb.Entry(self.frame, width=8)
+ self.token_limit_entry.insert(0, str(self.config.get('token_limit', 200000)))
+ self.token_limit_entry.grid(row=6, column=1, sticky=tk.W, padx=5, pady=5)
+
+ self.toggle_token_btn = tb.Button(self.frame, text="Disable Input Token Limit",
+ command=self.toggle_token_limit, bootstyle="danger-outline", width=21)
+ self.toggle_token_btn.grid(row=7, column=1, sticky=tk.W, padx=5, pady=5)
+
+ # Contextual Translation (right side, row 3) - with extra padding on top
+ tb.Checkbutton(self.frame, text="Contextual Translation", variable=self.contextual_var,
+ command=self._on_contextual_toggle).grid(
+ row=3, column=2, columnspan=2, sticky=tk.W, padx=5, pady=(25, 5)) # Added extra top padding
+
+ # Translation History Limit (row 4)
+ self.trans_history_label = tb.Label(self.frame, text="Translation History Limit:")
+ self.trans_history_label.grid(row=4, column=2, sticky=tk.W, padx=5, pady=5)
+ self.trans_history = tb.Entry(self.frame, width=6)
+ self.trans_history.insert(0, str(self.config.get('translation_history_limit', 2)))
+ self.trans_history.grid(row=4, column=3, sticky=tk.W, padx=5, pady=5)
+
+ # Rolling History (row 5)
+ self.rolling_checkbox = tb.Checkbutton(self.frame, text="Rolling History Window", variable=self.translation_history_rolling_var,
+ bootstyle="round-toggle")
+ self.rolling_checkbox.grid(row=5, column=2, sticky=tk.W, padx=5, pady=5)
+ self.rolling_history_desc = tk.Label(self.frame, text="(Keep recent history instead of purging)",
+ font=('TkDefaultFont', 11), fg='gray')
+ self.rolling_history_desc.grid(row=5, column=3, sticky=tk.W, padx=5, pady=5)
+
+ # Temperature (row 6)
+ tb.Label(self.frame, text="Temperature:").grid(row=6, column=2, sticky=tk.W, padx=5, pady=5)
+ self.trans_temp = tb.Entry(self.frame, width=6)
+ self.trans_temp.insert(0, str(self.config.get('translation_temperature', 0.3)))
+ self.trans_temp.grid(row=6, column=3, sticky=tk.W, padx=5, pady=5)
+
+ # Batch Translation (row 7)
+ self.batch_checkbox = tb.Checkbutton(self.frame, text="Batch Translation", variable=self.batch_translation_var,
+ bootstyle="round-toggle")
+ self.batch_checkbox.grid(row=7, column=2, sticky=tk.W, padx=5, pady=5)
+ self.batch_size_entry = tb.Entry(self.frame, width=6, textvariable=self.batch_size_var)
+ self.batch_size_entry.grid(row=7, column=3, sticky=tk.W, padx=5, pady=5)
+
+ # Set batch entry state
+ self.batch_size_entry.config(state=tk.NORMAL if self.batch_translation_var.get() else tk.DISABLED)
+ self.batch_translation_var.trace('w', lambda *args: self.batch_size_entry.config(
+ state=tk.NORMAL if self.batch_translation_var.get() else tk.DISABLED))
+
+ # Hidden entries for compatibility
+ self.title_trim = tb.Entry(self.frame, width=6)
+ self.title_trim.insert(0, str(self.config.get('title_trim_count', 1)))
+ self.group_trim = tb.Entry(self.frame, width=6)
+ self.group_trim.insert(0, str(self.config.get('group_affiliation_trim_count', 1)))
+ self.traits_trim = tb.Entry(self.frame, width=6)
+ self.traits_trim.insert(0, str(self.config.get('traits_trim_count', 1)))
+ self.refer_trim = tb.Entry(self.frame, width=6)
+ self.refer_trim.insert(0, str(self.config.get('refer_trim_count', 1)))
+ self.loc_trim = tb.Entry(self.frame, width=6)
+ self.loc_trim.insert(0, str(self.config.get('locations_trim_count', 1)))
+
+ # Set initial state based on contextual translation
+ self._on_contextual_toggle()
+
+ def _on_contextual_toggle(self):
+ """Handle contextual translation toggle - enable/disable related controls"""
+ is_contextual = self.contextual_var.get()
+
+ # Disable controls when contextual is ON, enable when OFF
+ state = tk.NORMAL if is_contextual else tk.DISABLED
+
+ # Disable/enable translation history limit entry and gray out label
+ self.trans_history.config(state=state)
+ self.trans_history_label.config(foreground='white' if is_contextual else 'gray')
+
+ # Disable/enable rolling history checkbox and gray out description
+ self.rolling_checkbox.config(state=state)
+ self.rolling_history_desc.config(foreground='gray' if is_contextual else '#404040')
+
+ def _create_api_section(self):
+ """Create API key section"""
+ self.api_key_label = tb.Label(self.frame, text="OpenAI/Gemini/... API Key:")
+ self.api_key_label.grid(row=8, column=0, sticky=tk.W, padx=5, pady=5)
+ self.api_key_entry = tb.Entry(self.frame, show='*')
+ self.api_key_entry.grid(row=8, column=1, columnspan=3, sticky=tk.EW, padx=5, pady=5)
+ initial_key = self.config.get('api_key', '')
+ if initial_key:
+ self.api_key_entry.insert(0, initial_key)
+ tb.Button(self.frame, text="Show", command=self.toggle_api_visibility, width=12).grid(row=8, column=4, sticky=tk.EW, padx=5, pady=5)
+
+ # Other Settings button
+ tb.Button(self.frame, text="⚙️ Other Setting", command=self.open_other_settings,
+ bootstyle="info-outline", width=15).grid(row=7, column=4, sticky=tk.EW, padx=5, pady=5)
+
+ # Remove AI Artifacts
+ tb.Checkbutton(self.frame, text="Remove AI Artifacts", variable=self.REMOVE_AI_ARTIFACTS_var,
+ bootstyle="round-toggle").grid(row=7, column=0, columnspan=5, sticky=tk.W, padx=5, pady=(0,5))
+
+ def _create_prompt_section(self):
+ """Create system prompt section with UIHelper"""
+ tb.Label(self.frame, text="System Prompt:").grid(row=9, column=0, sticky=tk.NW, padx=5, pady=5)
+
+ # Use UIHelper to create text widget with undo/redo
+ self.prompt_text = self.ui.setup_scrollable_text(
+ self.frame,
+ height=5,
+ width=60,
+ wrap='word'
+ )
+ self.prompt_text.grid(row=9, column=1, columnspan=3, sticky=tk.NSEW, padx=5, pady=5)
+
+ # Output Token Limit button
+ self.output_btn = tb.Button(self.frame, text=f"Output Token Limit: {self.max_output_tokens}",
+ command=self.prompt_custom_token_limit, bootstyle="info", width=22)
+ self.output_btn.grid(row=9, column=0, sticky=tk.W, padx=5, pady=5)
+
+ # Run Translation button
+ self.run_button = tb.Button(self.frame, text="Run Translation", command=self.run_translation_thread,
+ bootstyle="success", width=14)
+ self.run_button.grid(row=9, column=4, sticky=tk.N+tk.S+tk.EW, padx=5, pady=5)
+ self.master.update_idletasks()
+ self.run_base_w = self.run_button.winfo_width()
+ self.run_base_h = self.run_button.winfo_height()
+
+ # Setup resize handler
+ self._resize_handler = self.ui.create_button_resize_handler(
+ self.run_button,
+ self.run_base_w,
+ self.run_base_h,
+ self.master,
+ BASE_WIDTH,
+ BASE_HEIGHT
+ )
+
+ def _create_log_section(self):
+ """Create log text area with UIHelper"""
+ self.log_text = scrolledtext.ScrolledText(self.frame, wrap=tk.WORD)
+ self.log_text.grid(row=10, column=0, columnspan=5, sticky=tk.NSEW, padx=5, pady=5)
+
+ # Use UIHelper to block editing
+ self.ui.block_text_editing(self.log_text)
+
+ # Setup context menu
+ self.log_text.bind("", self._show_context_menu)
+ if sys.platform == "darwin":
+ self.log_text.bind("", self._show_context_menu)
+
+ def _check_poe_model(self, *args):
+ """Automatically show POE helper when POE model is selected"""
+ model = self.model_var.get().lower()
+
+ # Check if POE model is selected
+ if model.startswith('poe/'):
+ current_key = self.api_key_entry.get().strip()
+
+ # Only show helper if no valid POE cookie is set
+ if not current_key.startswith('p-b:'):
+ # Use a flag to prevent showing multiple times in same session
+ if not getattr(self, '_poe_helper_shown', False):
+ self._poe_helper_shown = True
+ # Change self.root to self.master
+ self.master.after(100, self._show_poe_setup_dialog)
+ else:
+ # Reset flag when switching away from POE
+ self._poe_helper_shown = False
+
+ def _show_poe_setup_dialog(self):
+ """Show POE cookie setup dialog"""
+ # Create dialog using WindowManager
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "POE Authentication Required",
+ width=650,
+ height=450,
+ max_width_ratio=0.8,
+ max_height_ratio=0.85
+ )
+
+ # Header
+ header_frame = tk.Frame(scrollable_frame)
+ header_frame.pack(fill='x', padx=20, pady=(20, 10))
+
+ tk.Label(header_frame, text="POE Cookie Authentication",
+ font=('TkDefaultFont', 12, 'bold')).pack()
+
+ # Important notice
+ notice_frame = tk.Frame(scrollable_frame)
+ notice_frame.pack(fill='x', padx=20, pady=(0, 20))
+
+ tk.Label(notice_frame,
+ text="⚠️ POE uses HttpOnly cookies that cannot be accessed by JavaScript",
+ foreground='red', font=('TkDefaultFont', 10, 'bold')).pack()
+
+ tk.Label(notice_frame,
+ text="You must manually copy the cookie from Developer Tools",
+ foreground='gray').pack()
+
+ # Instructions
+ self._create_poe_manual_instructions(scrollable_frame)
+
+ # Button
+ button_frame = tk.Frame(scrollable_frame)
+ button_frame.pack(fill='x', padx=20, pady=(10, 20))
+
+ def close_dialog():
+ dialog.destroy()
+ # Check if user added a cookie
+ current_key = self.api_key_entry.get().strip()
+ if model := self.model_var.get().lower():
+ if model.startswith('poe/') and not current_key.startswith('p-b:'):
+ self.append_log("⚠️ POE models require cookie authentication. Please add your p-b cookie to the API key field.")
+
+ tb.Button(button_frame, text="Close", command=close_dialog,
+ bootstyle="secondary").pack()
+
+ # Auto-resize and show
+ self.wm.auto_resize_dialog(dialog, canvas)
+
+ def _create_poe_manual_instructions(self, parent):
+ """Create manual instructions for getting POE cookie"""
+ frame = tk.LabelFrame(parent, text="How to Get Your POE Cookie")
+ frame.pack(fill='both', expand=True, padx=20, pady=10)
+
+ # Step-by-step with visual formatting
+ steps = [
+ ("1.", "Go to poe.com and LOG IN to your account", None),
+ ("2.", "Press F12 to open Developer Tools", None),
+ ("3.", "Navigate to:", None),
+ ("", "• Chrome/Edge: Application → Cookies → https://poe.com", "indent"),
+ ("", "• Firefox: Storage → Cookies → https://poe.com", "indent"),
+ ("", "• Safari: Storage → Cookies → poe.com", "indent"),
+ ("4.", "Find the cookie named 'p-b'", None),
+ ("5.", "Double-click its Value to select it", None),
+ ("6.", "Copy the value (Ctrl+C or right-click → Copy)", None),
+ ("7.", "In Glossarion's API key field, type: p-b:", None),
+ ("8.", "Paste the cookie value after p-b:", None)
+ ]
+
+ for num, text, style in steps:
+ step_frame = tk.Frame(frame)
+ step_frame.pack(anchor='w', padx=20, pady=2)
+
+ if style == "indent":
+ tk.Label(step_frame, text=" ").pack(side='left')
+
+ if num:
+ tk.Label(step_frame, text=num, font=('TkDefaultFont', 10, 'bold'),
+ width=3).pack(side='left')
+
+ tk.Label(step_frame, text=text).pack(side='left')
+
+ # Example
+ example_frame = tk.LabelFrame(parent, text="Example API Key Format")
+ example_frame.pack(fill='x', padx=20, pady=(10, 0))
+
+ example_entry = tk.Entry(example_frame, font=('Consolas', 11))
+ example_entry.pack(padx=10, pady=10, fill='x')
+ example_entry.insert(0, "p-b:RyP5ORQXFO8qXbiTBKD2vA%3D%3D")
+ example_entry.config(state='readonly')
+
+ # Additional info
+ info_frame = tk.Frame(parent)
+ info_frame.pack(fill='x', padx=20, pady=(10, 0))
+
+ info_text = """Note: The cookie value is usually a long string ending with %3D%3D
+ If you see multiple p-b cookies, use the one with the longest value."""
+
+ tk.Label(info_frame, text=info_text, foreground='gray',
+ justify='left').pack(anchor='w')
+
+ def open_async_processing(self):
+ """Open the async processing dialog"""
+ # Check if translation is running
+ if hasattr(self, 'translation_thread') and self.translation_thread and self.translation_thread.is_alive():
+ self.append_log("⚠️ Cannot open async processing while translation is in progress.")
+ messagebox.showwarning("Process Running", "Please wait for the current translation to complete.")
+ return
+
+ # Check if glossary extraction is running
+ if hasattr(self, 'glossary_thread') and self.glossary_thread and self.glossary_thread.is_alive():
+ self.append_log("⚠️ Cannot open async processing while glossary extraction is in progress.")
+ messagebox.showwarning("Process Running", "Please wait for glossary extraction to complete.")
+ return
+
+ # Check if file is selected
+ if not hasattr(self, 'file_path') or not self.file_path:
+ self.append_log("⚠️ Please select a file before opening async processing.")
+ messagebox.showwarning("No File Selected", "Please select an EPUB or TXT file first.")
+ return
+
+ try:
+ # Lazy import the async processor
+ if not hasattr(self, '_async_processor_imported'):
+ self.append_log("Loading async processing module...")
+ from async_api_processor import show_async_processing_dialog
+ self._async_processor_imported = True
+ self._show_async_processing_dialog = show_async_processing_dialog
+
+ # Show the dialog
+ self.append_log("Opening async processing dialog...")
+ self._show_async_processing_dialog(self.master, self)
+
+ except ImportError as e:
+ self.append_log(f"❌ Failed to load async processing module: {e}")
+ messagebox.showerror(
+ "Module Not Found",
+ "The async processing module could not be loaded.\n"
+ "Please ensure async_api_processor.py is in the same directory."
+ )
+ except Exception as e:
+ self.append_log(f"❌ Error opening async processing: {e}")
+ messagebox.showerror("Error", f"Failed to open async processing: {str(e)}")
+
+ def _lazy_load_modules(self, splash_callback=None):
+ """Load heavy modules only when needed - Enhanced with thread safety, retry logic, and progress tracking"""
+ # Quick return if already loaded (unchanged for compatibility)
+ if self._modules_loaded:
+ return True
+
+ # Enhanced thread safety with timeout protection
+ if self._modules_loading:
+ timeout_start = time.time()
+ timeout_duration = 30.0 # 30 second timeout to prevent infinite waiting
+
+ while self._modules_loading and not self._modules_loaded:
+ # Check for timeout to prevent infinite loops
+ if time.time() - timeout_start > timeout_duration:
+ self.append_log("⚠️ Module loading timeout - resetting loading state")
+ self._modules_loading = False
+ break
+ time.sleep(0.1)
+ return self._modules_loaded
+
+ # Set loading flag with enhanced error handling
+ self._modules_loading = True
+ loading_start_time = time.time()
+
+ try:
+ if splash_callback:
+ splash_callback("Loading translation modules...")
+
+ # Initialize global variables to None FIRST to avoid NameError
+ global translation_main, translation_stop_flag, translation_stop_check
+ global glossary_main, glossary_stop_flag, glossary_stop_check
+ global fallback_compile_epub, scan_html_folder
+
+ # Set all to None initially in case imports fail
+ translation_main = None
+ translation_stop_flag = None
+ translation_stop_check = None
+ glossary_main = None
+ glossary_stop_flag = None
+ glossary_stop_check = None
+ fallback_compile_epub = None
+ scan_html_folder = None
+
+ # Enhanced module configuration with validation and retry info
+ modules = [
+ {
+ 'name': 'TransateKRtoEN',
+ 'display_name': 'translation engine',
+ 'imports': ['main', 'set_stop_flag', 'is_stop_requested'],
+ 'global_vars': ['translation_main', 'translation_stop_flag', 'translation_stop_check'],
+ 'critical': True,
+ 'retry_count': 0,
+ 'max_retries': 2
+ },
+ {
+ 'name': 'extract_glossary_from_epub',
+ 'display_name': 'glossary extractor',
+ 'imports': ['main', 'set_stop_flag', 'is_stop_requested'],
+ 'global_vars': ['glossary_main', 'glossary_stop_flag', 'glossary_stop_check'],
+ 'critical': True,
+ 'retry_count': 0,
+ 'max_retries': 2
+ },
+ {
+ 'name': 'epub_converter',
+ 'display_name': 'EPUB converter',
+ 'imports': ['fallback_compile_epub'],
+ 'global_vars': ['fallback_compile_epub'],
+ 'critical': False,
+ 'retry_count': 0,
+ 'max_retries': 1
+ },
+ {
+ 'name': 'scan_html_folder',
+ 'display_name': 'QA scanner',
+ 'imports': ['scan_html_folder'],
+ 'global_vars': ['scan_html_folder'],
+ 'critical': False,
+ 'retry_count': 0,
+ 'max_retries': 1
+ }
+ ]
+
+ success_count = 0
+ total_modules = len(modules)
+ failed_modules = []
+
+ # Enhanced module loading with progress tracking and retry logic
+ for i, module_info in enumerate(modules):
+ module_name = module_info['name']
+ display_name = module_info['display_name']
+ max_retries = module_info['max_retries']
+
+ # Progress callback with detailed information
+ if splash_callback:
+ progress_percent = int((i / total_modules) * 100)
+ splash_callback(f"Loading {display_name}... ({progress_percent}%)")
+
+ # Retry logic for robust loading
+ loaded_successfully = False
+
+ for retry_attempt in range(max_retries + 1):
+ try:
+ if retry_attempt > 0:
+ # Add small delay between retries
+ time.sleep(0.2)
+ if splash_callback:
+ splash_callback(f"Retrying {display_name}... (attempt {retry_attempt + 1})")
+
+ # Enhanced import logic with specific error handling
+ if module_name == 'TransateKRtoEN':
+ # Validate the module before importing critical functions
+ import TransateKRtoEN
+ # Verify the module has required functions
+ if hasattr(TransateKRtoEN, 'main') and hasattr(TransateKRtoEN, 'set_stop_flag'):
+ translation_main = TransateKRtoEN.main
+ translation_stop_flag = TransateKRtoEN.set_stop_flag
+ translation_stop_check = TransateKRtoEN.is_stop_requested if hasattr(TransateKRtoEN, 'is_stop_requested') else None
+ else:
+ raise ImportError("TransateKRtoEN module missing required functions")
+
+ elif module_name == 'extract_glossary_from_epub':
+ # Validate the module before importing critical functions
+ import extract_glossary_from_epub
+ if hasattr(extract_glossary_from_epub, 'main') and hasattr(extract_glossary_from_epub, 'set_stop_flag'):
+ glossary_main = extract_glossary_from_epub.main
+ glossary_stop_flag = extract_glossary_from_epub.set_stop_flag
+ glossary_stop_check = extract_glossary_from_epub.is_stop_requested if hasattr(extract_glossary_from_epub, 'is_stop_requested') else None
+ else:
+ raise ImportError("extract_glossary_from_epub module missing required functions")
+
+ elif module_name == 'epub_converter':
+ # Validate the module before importing
+ import epub_converter
+ if hasattr(epub_converter, 'fallback_compile_epub'):
+ fallback_compile_epub = epub_converter.fallback_compile_epub
+ else:
+ raise ImportError("epub_converter module missing fallback_compile_epub function")
+
+ elif module_name == 'scan_html_folder':
+ # Validate the module before importing
+ import scan_html_folder as scan_module
+ if hasattr(scan_module, 'scan_html_folder'):
+ scan_html_folder = scan_module.scan_html_folder
+ else:
+ raise ImportError("scan_html_folder module missing scan_html_folder function")
+
+ # If we reach here, import was successful
+ loaded_successfully = True
+ success_count += 1
+ break
+
+ except ImportError as e:
+ module_info['retry_count'] = retry_attempt + 1
+ error_msg = str(e)
+
+ # Log retry attempts
+ if retry_attempt < max_retries:
+ if hasattr(self, 'append_log'):
+ self.append_log(f"⚠️ Failed to load {display_name} (attempt {retry_attempt + 1}): {error_msg}")
+ else:
+ # Final failure
+ print(f"Warning: Could not import {module_name} after {max_retries + 1} attempts: {error_msg}")
+ failed_modules.append({
+ 'name': module_name,
+ 'display_name': display_name,
+ 'error': error_msg,
+ 'critical': module_info['critical']
+ })
+ break
+
+ except Exception as e:
+ # Handle unexpected errors
+ error_msg = f"Unexpected error: {str(e)}"
+ print(f"Warning: Unexpected error loading {module_name}: {error_msg}")
+ failed_modules.append({
+ 'name': module_name,
+ 'display_name': display_name,
+ 'error': error_msg,
+ 'critical': module_info['critical']
+ })
+ break
+
+ # Enhanced progress feedback
+ if loaded_successfully and splash_callback:
+ progress_percent = int(((i + 1) / total_modules) * 100)
+ splash_callback(f"✅ {display_name} loaded ({progress_percent}%)")
+
+ # Calculate loading time for performance monitoring
+ loading_time = time.time() - loading_start_time
+
+ # Enhanced success/failure reporting
+ if splash_callback:
+ if success_count == total_modules:
+ splash_callback(f"Loaded {success_count}/{total_modules} modules successfully in {loading_time:.1f}s")
+ else:
+ splash_callback(f"Loaded {success_count}/{total_modules} modules ({len(failed_modules)} failed)")
+
+ # Enhanced logging with module status details
+ if hasattr(self, 'append_log'):
+ if success_count == total_modules:
+ self.append_log(f"✅ Loaded {success_count}/{total_modules} modules successfully in {loading_time:.1f}s")
+ else:
+ self.append_log(f"⚠️ Loaded {success_count}/{total_modules} modules successfully ({len(failed_modules)} failed)")
+
+ # Report critical failures
+ critical_failures = [f for f in failed_modules if f['critical']]
+ if critical_failures:
+ for failure in critical_failures:
+ self.append_log(f"❌ Critical module failed: {failure['display_name']} - {failure['error']}")
+
+ # Report non-critical failures
+ non_critical_failures = [f for f in failed_modules if not f['critical']]
+ if non_critical_failures:
+ for failure in non_critical_failures:
+ self.append_log(f"⚠️ Optional module failed: {failure['display_name']} - {failure['error']}")
+
+ # Store references to imported modules in instance variables for later use
+ self._translation_main = translation_main
+ self._translation_stop_flag = translation_stop_flag
+ self._translation_stop_check = translation_stop_check
+ self._glossary_main = glossary_main
+ self._glossary_stop_flag = glossary_stop_flag
+ self._glossary_stop_check = glossary_stop_check
+ self._fallback_compile_epub = fallback_compile_epub
+ self._scan_html_folder = scan_html_folder
+
+ # Final module state update with enhanced error checking
+ self._modules_loaded = True
+ self._modules_loading = False
+
+ # Enhanced module availability checking with better integration
+ if hasattr(self, 'master'):
+ self.master.after(0, self._check_modules)
+
+ # Return success status - maintain compatibility by returning True if any modules loaded
+ # But also check for critical module failures
+ critical_failures = [f for f in failed_modules if f['critical']]
+ if critical_failures and success_count == 0:
+ # Complete failure case
+ if hasattr(self, 'append_log'):
+ self.append_log("❌ Critical module loading failed - some functionality may be unavailable")
+ return False
+
+ return True
+
+ except Exception as unexpected_error:
+ # Enhanced error recovery for unexpected failures
+ error_msg = f"Unexpected error during module loading: {str(unexpected_error)}"
+ print(f"Critical error: {error_msg}")
+
+ if hasattr(self, 'append_log'):
+ self.append_log(f"❌ Module loading failed: {error_msg}")
+
+ # Reset states for retry possibility
+ self._modules_loaded = False
+ self._modules_loading = False
+
+ if splash_callback:
+ splash_callback(f"Module loading failed: {str(unexpected_error)}")
+
+ return False
+
+ finally:
+ # Enhanced cleanup - ensure loading flag is always reset
+ if self._modules_loading:
+ self._modules_loading = False
+
+ def _check_modules(self):
+ """Check which modules are available and disable buttons if needed"""
+ if not self._modules_loaded:
+ return
+
+ # Use the stored instance variables instead of globals
+ button_checks = [
+ (self._translation_main if hasattr(self, '_translation_main') else None, 'button_run', "Translation"),
+ (self._glossary_main if hasattr(self, '_glossary_main') else None, 'glossary_button', "Glossary extraction"),
+ (self._fallback_compile_epub if hasattr(self, '_fallback_compile_epub') else None, 'epub_button', "EPUB converter"),
+ (self._scan_html_folder if hasattr(self, '_scan_html_folder') else None, 'qa_button', "QA scanner")
+ ]
+
+ for module, button_attr, name in button_checks:
+ if module is None and hasattr(self, button_attr):
+ button = getattr(self, button_attr, None)
+ if button:
+ button.config(state='disabled')
+ self.append_log(f"⚠️ {name} module not available")
+
+ def configure_title_prompt(self):
+ """Configure the book title translation prompt"""
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Configure Book Title Translation",
+ width=950,
+ height=850 # Increased height for two prompts
+ )
+
+ main_frame = tk.Frame(dialog, padx=20, pady=20)
+ main_frame.pack(fill=tk.BOTH, expand=True)
+
+ # System Prompt Section
+ tk.Label(main_frame, text="System Prompt (AI Instructions)",
+ font=('TkDefaultFont', 12, 'bold')).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(main_frame, text="This defines how the AI should behave when translating titles:",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(0, 10))
+
+ self.title_system_prompt_text = self.ui.setup_scrollable_text(
+ main_frame, height=4, wrap=tk.WORD
+ )
+ self.title_system_prompt_text.pack(fill=tk.BOTH, expand=True, pady=(0, 15))
+ self.title_system_prompt_text.insert('1.0', self.config.get('book_title_system_prompt',
+ "You are a translator. Respond with only the translated text, nothing else. Do not add any explanation or additional content."))
+
+ # User Prompt Section
+ tk.Label(main_frame, text="User Prompt (Translation Request)",
+ font=('TkDefaultFont', 12, 'bold')).pack(anchor=tk.W, pady=(10, 5))
+
+ tk.Label(main_frame, text="This prompt will be used when translating book titles.\n"
+ "The book title will be appended after this prompt.",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(0, 10))
+
+ self.title_prompt_text = self.ui.setup_scrollable_text(
+ main_frame, height=6, wrap=tk.WORD
+ )
+ self.title_prompt_text.pack(fill=tk.BOTH, expand=True, pady=(0, 10))
+ self.title_prompt_text.insert('1.0', self.book_title_prompt)
+
+ lang_frame = tk.Frame(main_frame)
+ lang_frame.pack(fill=tk.X, pady=(10, 0))
+
+ tk.Label(lang_frame, text="💡 Tip: Modify the prompts above to translate to other languages",
+ font=('TkDefaultFont', 10), fg='blue').pack(anchor=tk.W)
+
+ example_frame = tk.LabelFrame(main_frame, text="Example Prompts", padx=10, pady=10)
+ example_frame.pack(fill=tk.X, pady=(10, 0))
+
+ examples = [
+ ("Spanish", "Traduce este título de libro al español manteniendo los acrónimos:"),
+ ("French", "Traduisez ce titre de livre en français en conservant les acronymes:"),
+ ("German", "Übersetzen Sie diesen Buchtitel ins Deutsche und behalten Sie Akronyme bei:"),
+ ("Keep Original", "Return the title exactly as provided without any translation:")
+ ]
+
+ for lang, prompt in examples:
+ btn = tb.Button(example_frame, text=f"Use {lang}",
+ command=lambda p=prompt: self.title_prompt_text.replace('1.0', tk.END, p),
+ bootstyle="secondary-outline", width=15)
+ btn.pack(side=tk.LEFT, padx=2, pady=2)
+
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(20, 0))
+
+ def save_title_prompt():
+ self.book_title_prompt = self.title_prompt_text.get('1.0', tk.END).strip()
+ self.config['book_title_prompt'] = self.book_title_prompt
+
+ # Save the system prompt too
+ self.config['book_title_system_prompt'] = self.title_system_prompt_text.get('1.0', tk.END).strip()
+
+ #messagebox.showinfo("Success", "Book title prompts saved!")
+ dialog.destroy()
+
+ def reset_title_prompt():
+ if messagebox.askyesno("Reset Prompts", "Reset both prompts to defaults?"):
+ # Reset system prompt
+ default_system = "You are a translator. Respond with only the translated text, nothing else. Do not add any explanation or additional content."
+ self.title_system_prompt_text.delete('1.0', tk.END)
+ self.title_system_prompt_text.insert('1.0', default_system)
+
+ # Reset user prompt
+ default_prompt = "Translate this book title to English while retaining any acronyms:"
+ self.title_prompt_text.delete('1.0', tk.END)
+ self.title_prompt_text.insert('1.0', default_prompt)
+
+ tb.Button(button_frame, text="Save", command=save_title_prompt,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Reset to Default", command=reset_title_prompt,
+ bootstyle="warning", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ dialog.deiconify()
+
+ def detect_novel_numbering_unified(self, output_dir, progress_data):
+ """
+ Use the backend's detect_novel_numbering function for consistent detection
+ """
+ try:
+ # Try to load the backend detection function
+ if not self._lazy_load_modules():
+ # Fallback to current GUI logic if modules not loaded
+ return self._detect_novel_numbering_gui_fallback(output_dir, progress_data)
+
+ # Import the detection function from backend
+ from TransateKRtoEN import detect_novel_numbering
+
+ # Build a chapters list from progress data to pass to backend function
+ chapters = []
+ for chapter_key, chapter_info in progress_data.get("chapters", {}).items():
+ # Get the output file, handling None values
+ output_file = chapter_info.get('output_file', '')
+
+ chapter_dict = {
+ 'original_basename': chapter_info.get('original_basename', ''),
+ 'filename': output_file or '', # Ensure it's never None
+ 'num': chapter_info.get('chapter_num', 0)
+ }
+
+ # Only add the output file path if it exists and is not empty
+ if output_file and output_file.strip():
+ chapter_dict['filename'] = os.path.join(output_dir, output_file)
+ else:
+ # If no output file, try to discover a file based on original basename or chapter number
+ retain = os.getenv('RETAIN_SOURCE_EXTENSION', '0') == '1' or self.config.get('retain_source_extension', False)
+ allowed_exts = ('.html', '.xhtml', '.htm')
+ discovered = None
+
+ if chapter_dict['original_basename']:
+ base = chapter_dict['original_basename']
+ # Scan output_dir for either response_{base}.* or {base}.*
+ try:
+ for f in os.listdir(output_dir):
+ f_low = f.lower()
+ if f_low.endswith(allowed_exts):
+ name_no_ext = os.path.splitext(f)[0]
+ if name_no_ext.startswith('response_'):
+ candidate_base = name_no_ext[9:]
+ else:
+ candidate_base = name_no_ext
+ if candidate_base == base:
+ discovered = f
+ break
+ except Exception:
+ pass
+
+ if not discovered:
+ # Fall back to expected naming per mode
+ if retain:
+ # Default to original basename with .html
+ discovered = f"{base}.html"
+ else:
+ discovered = f"response_{base}.html"
+ else:
+ # Last resort: use chapter number pattern
+ chapter_num = chapter_info.get('actual_num', chapter_info.get('chapter_num', 0))
+ num_str = f"{int(chapter_num):04d}" if isinstance(chapter_num, (int, float)) else str(chapter_num)
+ try:
+ for f in os.listdir(output_dir):
+ f_low = f.lower()
+ if f_low.endswith(allowed_exts):
+ name_no_ext = os.path.splitext(f)[0]
+ # Remove optional response_ prefix
+ core = name_no_ext[9:] if name_no_ext.startswith('response_') else name_no_ext
+ if core.startswith(num_str):
+ discovered = f
+ break
+ except Exception:
+ pass
+
+ if not discovered:
+ if retain:
+ discovered = f"{num_str}.html"
+ else:
+ discovered = f"response_{num_str}.html"
+
+ chapter_dict['filename'] = os.path.join(output_dir, discovered)
+
+ chapters.append(chapter_dict)
+
+ # Use the backend detection logic
+ uses_zero_based = detect_novel_numbering(chapters)
+
+ print(f"[GUI] Unified detection result: {'0-based' if uses_zero_based else '1-based'}")
+ return uses_zero_based
+
+ except Exception as e:
+ print(f"[GUI] Error in unified detection: {e}")
+ # Fallback to GUI logic on error
+ return self._detect_novel_numbering_gui_fallback(output_dir, progress_data)
+
+ def _detect_novel_numbering_gui_fallback(self, output_dir, progress_data):
+ """
+ Fallback detection logic (current GUI implementation)
+ """
+ uses_zero_based = False
+
+ for chapter_key, chapter_info in progress_data.get("chapters", {}).items():
+ if chapter_info.get("status") == "completed":
+ output_file = chapter_info.get("output_file", "")
+ stored_chapter_num = chapter_info.get("chapter_num", 0)
+ if output_file:
+ # Allow filenames with or without 'response_' prefix
+ match = re.search(r'(?:^response_)?(\d+)', output_file)
+ if match:
+ file_num = int(match.group(1))
+ if file_num == stored_chapter_num - 1:
+ uses_zero_based = True
+ break
+ elif file_num == stored_chapter_num:
+ uses_zero_based = False
+ break
+
+ if not uses_zero_based:
+ try:
+ for file in os.listdir(output_dir):
+ if re.search(r'_0+[_\.]', file):
+ uses_zero_based = True
+ break
+ except: pass
+
+ return uses_zero_based
+
+ def force_retranslation(self):
+ """Force retranslation of specific chapters or images with improved display"""
+
+ # Check for multiple file selection first
+ if hasattr(self, 'selected_files') and len(self.selected_files) > 1:
+ self._force_retranslation_multiple_files()
+ return
+
+ # Check if it's a folder selection (for images)
+ if hasattr(self, 'selected_files') and len(self.selected_files) > 0:
+ # Check if the first selected file is actually a folder
+ first_item = self.selected_files[0]
+ if os.path.isdir(first_item):
+ self._force_retranslation_images_folder(first_item)
+ return
+
+ # Original logic for single files
+ input_path = self.entry_epub.get()
+ if not input_path or not os.path.isfile(input_path):
+ messagebox.showerror("Error", "Please select a valid EPUB, text file, or image folder first.")
+ return
+
+ # Check if it's an image file
+ image_extensions = ('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp')
+ if input_path.lower().endswith(image_extensions):
+ self._force_retranslation_single_image(input_path)
+ return
+
+ # For EPUB/text files, use the shared logic
+ self._force_retranslation_epub_or_text(input_path)
+
+
+ def _force_retranslation_epub_or_text(self, file_path, parent_dialog=None, tab_frame=None):
+ """
+ Shared logic for force retranslation of EPUB/text files with OPF support
+ Can be used standalone or embedded in a tab
+
+ Args:
+ file_path: Path to the EPUB/text file
+ parent_dialog: If provided, won't create its own dialog
+ tab_frame: If provided, will render into this frame instead of creating dialog
+
+ Returns:
+ dict: Contains all the UI elements and data for external access
+ """
+
+ epub_base = os.path.splitext(os.path.basename(file_path))[0]
+ output_dir = epub_base
+
+ if not os.path.exists(output_dir):
+ if not parent_dialog:
+ messagebox.showinfo("Info", "No translation output found for this file.")
+ return None
+
+ progress_file = os.path.join(output_dir, "translation_progress.json")
+ if not os.path.exists(progress_file):
+ if not parent_dialog:
+ messagebox.showinfo("Info", "No progress tracking found.")
+ return None
+
+ with open(progress_file, 'r', encoding='utf-8') as f:
+ prog = json.load(f)
+
+ # =====================================================
+ # PARSE CONTENT.OPF FOR CHAPTER MANIFEST
+ # =====================================================
+
+ spine_chapters = []
+ opf_chapter_order = {}
+ is_epub = file_path.lower().endswith('.epub')
+
+ if is_epub and os.path.exists(file_path):
+ try:
+ import xml.etree.ElementTree as ET
+ import zipfile
+
+ with zipfile.ZipFile(file_path, 'r') as zf:
+ # Find content.opf file
+ opf_path = None
+ opf_content = None
+
+ # First try to find via container.xml
+ try:
+ container_content = zf.read('META-INF/container.xml')
+ container_root = ET.fromstring(container_content)
+ rootfile = container_root.find('.//{urn:oasis:names:tc:opendocument:xmlns:container}rootfile')
+ if rootfile is not None:
+ opf_path = rootfile.get('full-path')
+ except:
+ pass
+
+ # Fallback: search for content.opf
+ if not opf_path:
+ for name in zf.namelist():
+ if name.endswith('content.opf'):
+ opf_path = name
+ break
+
+ if opf_path:
+ opf_content = zf.read(opf_path)
+
+ # Parse OPF
+ root = ET.fromstring(opf_content)
+
+ # Handle namespaces
+ ns = {'opf': 'http://www.idpf.org/2007/opf'}
+ if root.tag.startswith('{'):
+ default_ns = root.tag[1:root.tag.index('}')]
+ ns = {'opf': default_ns}
+
+ # Get manifest - all chapter files
+ manifest_chapters = {}
+
+ for item in root.findall('.//opf:manifest/opf:item', ns):
+ item_id = item.get('id')
+ href = item.get('href')
+ media_type = item.get('media-type', '')
+
+ if item_id and href and ('html' in media_type.lower() or href.endswith(('.html', '.xhtml', '.htm'))):
+ filename = os.path.basename(href)
+
+ # Skip navigation, toc, and cover files
+ if not any(skip in filename.lower() for skip in ['nav.', 'toc.', 'cover.']):
+ manifest_chapters[item_id] = {
+ 'filename': filename,
+ 'href': href,
+ 'media_type': media_type
+ }
+
+ # Get spine order - the reading order
+ spine = root.find('.//opf:spine', ns)
+
+ if spine is not None:
+ for itemref in spine.findall('opf:itemref', ns):
+ idref = itemref.get('idref')
+ if idref and idref in manifest_chapters:
+ chapter_info = manifest_chapters[idref]
+ filename = chapter_info['filename']
+
+ # Skip navigation, toc, and cover files
+ if not any(skip in filename.lower() for skip in ['nav.', 'toc.', 'cover.']):
+ # Extract chapter number from filename
+ import re
+ matches = re.findall(r'(\d+)', filename)
+ if matches:
+ file_chapter_num = int(matches[-1])
+ else:
+ file_chapter_num = len(spine_chapters)
+
+ spine_chapters.append({
+ 'id': idref,
+ 'filename': filename,
+ 'position': len(spine_chapters),
+ 'file_chapter_num': file_chapter_num,
+ 'status': 'unknown', # Will be updated
+ 'output_file': None # Will be updated
+ })
+
+ # Store the order for later use
+ opf_chapter_order[filename] = len(spine_chapters) - 1
+
+ # Also store without extension for matching
+ filename_noext = os.path.splitext(filename)[0]
+ opf_chapter_order[filename_noext] = len(spine_chapters) - 1
+
+ except Exception as e:
+ print(f"Warning: Could not parse OPF: {e}")
+
+ # =====================================================
+ # MATCH OPF CHAPTERS WITH TRANSLATION PROGRESS
+ # =====================================================
+
+ # Build a map of original basenames to progress entries
+ basename_to_progress = {}
+ for chapter_key, chapter_info in prog.get("chapters", {}).items():
+ original_basename = chapter_info.get("original_basename", "")
+ if original_basename:
+ if original_basename not in basename_to_progress:
+ basename_to_progress[original_basename] = []
+ basename_to_progress[original_basename].append((chapter_key, chapter_info))
+
+ # Also build a map of response files
+ response_file_to_progress = {}
+ for chapter_key, chapter_info in prog.get("chapters", {}).items():
+ output_file = chapter_info.get("output_file", "")
+ if output_file:
+ if output_file not in response_file_to_progress:
+ response_file_to_progress[output_file] = []
+ response_file_to_progress[output_file].append((chapter_key, chapter_info))
+
+ # Update spine chapters with translation status
+ for spine_ch in spine_chapters:
+ filename = spine_ch['filename']
+ chapter_num = spine_ch['file_chapter_num']
+
+ # Find the actual response file that exists
+ base_name = os.path.splitext(filename)[0]
+ expected_response = None
+
+ # Handle .htm.html -> .html conversion
+ stripped_base_name = base_name
+ if base_name.endswith('.htm'):
+ stripped_base_name = base_name[:-4] # Remove .htm suffix
+
+ # Look for translated file matching base name, with or without 'response_' and with allowed extensions
+ allowed_exts = ('.html', '.xhtml', '.htm')
+ for file in os.listdir(output_dir):
+ f_low = file.lower()
+ if f_low.endswith(allowed_exts):
+ name_no_ext = os.path.splitext(file)[0]
+ core = name_no_ext[9:] if name_no_ext.startswith('response_') else name_no_ext
+ # Accept matches for:
+ # - OPF filename without last extension (base_name)
+ # - Stripped base for .htm cases
+ # - OPF filename as-is (e.g., 'chapter_02.htm') when the output file is 'chapter_02.htm.xhtml'
+ if core == base_name or core == stripped_base_name or core == filename:
+ expected_response = file
+ break
+
+ # Fallback - per mode, prefer OPF filename when retain mode is on
+ if not expected_response:
+ retain = os.getenv('RETAIN_SOURCE_EXTENSION', '0') == '1' or self.config.get('retain_source_extension', False)
+ if retain:
+ expected_response = filename
+ else:
+ expected_response = f"response_{stripped_base_name}.html"
+
+ response_path = os.path.join(output_dir, expected_response)
+
+ # Check various ways to find the translation progress info
+ matched_info = None
+
+ # Method 1: Check by original basename
+ if filename in basename_to_progress:
+ entries = basename_to_progress[filename]
+ if entries:
+ _, chapter_info = entries[0]
+ matched_info = chapter_info
+
+ # Method 2: Check by response file (with corrected extension)
+ if not matched_info and expected_response in response_file_to_progress:
+ entries = response_file_to_progress[expected_response]
+ if entries:
+ _, chapter_info = entries[0]
+ matched_info = chapter_info
+
+ # Method 3: Search through all progress entries for matching output file
+ if not matched_info:
+ for chapter_key, chapter_info in prog.get("chapters", {}).items():
+ if chapter_info.get('output_file') == expected_response:
+ matched_info = chapter_info
+ break
+
+ # Method 4: CRUCIAL - Match by chapter number (actual_num vs file_chapter_num)
+ if not matched_info:
+ for chapter_key, chapter_info in prog.get("chapters", {}).items():
+ actual_num = chapter_info.get('actual_num')
+ # Also check 'chapter_num' as fallback
+ if actual_num is None:
+ actual_num = chapter_info.get('chapter_num')
+
+ if actual_num is not None and actual_num == chapter_num:
+ matched_info = chapter_info
+ break
+
+ # Determine if translation file exists
+ file_exists = os.path.exists(response_path)
+
+ # Set status and output file based on findings
+ if matched_info:
+ # We found progress tracking info - use its status
+ spine_ch['status'] = matched_info.get('status', 'unknown')
+ spine_ch['output_file'] = matched_info.get('output_file', expected_response)
+ spine_ch['progress_entry'] = matched_info
+
+ # Handle null output_file (common for failed/in_progress chapters)
+ if not spine_ch['output_file']:
+ spine_ch['output_file'] = expected_response
+
+ # Keep original extension (html/xhtml/htm) as written on disk
+
+ # Verify file actually exists for completed status
+ if spine_ch['status'] == 'completed':
+ output_path = os.path.join(output_dir, spine_ch['output_file'])
+ if not os.path.exists(output_path):
+ spine_ch['status'] = 'file_missing'
+
+ elif file_exists:
+ # File exists but no progress tracking - mark as completed
+ spine_ch['status'] = 'completed'
+ spine_ch['output_file'] = expected_response
+
+ else:
+ # No file and no progress tracking - not translated
+ spine_ch['status'] = 'not_translated'
+ spine_ch['output_file'] = expected_response
+
+ # =====================================================
+ # BUILD DISPLAY INFO
+ # =====================================================
+
+ chapter_display_info = []
+
+ if spine_chapters:
+ # Use OPF order
+ for spine_ch in spine_chapters:
+ display_info = {
+ 'key': spine_ch.get('filename', ''),
+ 'num': spine_ch['file_chapter_num'],
+ 'info': spine_ch.get('progress_entry', {}),
+ 'output_file': spine_ch['output_file'],
+ 'status': spine_ch['status'],
+ 'duplicate_count': 1,
+ 'entries': [],
+ 'opf_position': spine_ch['position'],
+ 'original_filename': spine_ch['filename']
+ }
+ chapter_display_info.append(display_info)
+ else:
+ # Fallback to original logic if no OPF
+ files_to_entries = {}
+ for chapter_key, chapter_info in prog.get("chapters", {}).items():
+ output_file = chapter_info.get("output_file", "")
+ if output_file:
+ if output_file not in files_to_entries:
+ files_to_entries[output_file] = []
+ files_to_entries[output_file].append((chapter_key, chapter_info))
+
+ for output_file, entries in files_to_entries.items():
+ chapter_key, chapter_info = entries[0]
+
+ # Extract chapter number
+ import re
+ matches = re.findall(r'(\d+)', output_file)
+ if matches:
+ chapter_num = int(matches[-1])
+ else:
+ chapter_num = 999999
+
+ # Override with stored values if available
+ if 'actual_num' in chapter_info and chapter_info['actual_num'] is not None:
+ chapter_num = chapter_info['actual_num']
+ elif 'chapter_num' in chapter_info and chapter_info['chapter_num'] is not None:
+ chapter_num = chapter_info['chapter_num']
+
+ status = chapter_info.get("status", "unknown")
+ if status == "completed_empty":
+ status = "completed"
+
+ # Check file existence
+ if status == "completed":
+ output_path = os.path.join(output_dir, output_file)
+ if not os.path.exists(output_path):
+ status = "file_missing"
+
+ chapter_display_info.append({
+ 'key': chapter_key,
+ 'num': chapter_num,
+ 'info': chapter_info,
+ 'output_file': output_file,
+ 'status': status,
+ 'duplicate_count': len(entries),
+ 'entries': entries
+ })
+
+ # Sort by chapter number
+ chapter_display_info.sort(key=lambda x: x['num'] if x['num'] is not None else 999999)
+
+ # =====================================================
+ # CREATE UI
+ # =====================================================
+
+ # If no parent dialog or tab frame, create standalone dialog
+ if not parent_dialog and not tab_frame:
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Force Retranslation - OPF Based" if spine_chapters else "Force Retranslation",
+ width=1000,
+ height=700
+ )
+ container = dialog
+ else:
+ container = tab_frame or parent_dialog
+ dialog = parent_dialog
+
+ # Title
+ title_text = "Chapters from content.opf (in reading order):" if spine_chapters else "Select chapters to retranslate:"
+ tk.Label(container, text=title_text,
+ font=('Arial', 12 if not tab_frame else 11, 'bold')).pack(pady=5)
+
+ # Statistics if OPF is available
+ if spine_chapters:
+ stats_frame = tk.Frame(container)
+ stats_frame.pack(pady=5)
+
+ total_chapters = len(spine_chapters)
+ completed = sum(1 for ch in spine_chapters if ch['status'] == 'completed')
+ missing = sum(1 for ch in spine_chapters if ch['status'] == 'not_translated')
+ failed = sum(1 for ch in spine_chapters if ch['status'] in ['failed', 'qa_failed'])
+ file_missing = sum(1 for ch in spine_chapters if ch['status'] == 'file_missing')
+
+ tk.Label(stats_frame, text=f"Total: {total_chapters} | ", font=('Arial', 10)).pack(side=tk.LEFT)
+ tk.Label(stats_frame, text=f"✅ Completed: {completed} | ", font=('Arial', 10), fg='green').pack(side=tk.LEFT)
+ tk.Label(stats_frame, text=f"❌ Missing: {missing} | ", font=('Arial', 10), fg='red').pack(side=tk.LEFT)
+ tk.Label(stats_frame, text=f"⚠️ Failed: {failed} | ", font=('Arial', 10), fg='orange').pack(side=tk.LEFT)
+ tk.Label(stats_frame, text=f"📁 File Missing: {file_missing}", font=('Arial', 10), fg='purple').pack(side=tk.LEFT)
+
+ # Main frame for listbox
+ main_frame = tk.Frame(container)
+ main_frame.pack(fill=tk.BOTH, expand=True, padx=10 if not tab_frame else 5, pady=5)
+
+ # Create scrollbars and listbox
+ h_scrollbar = ttk.Scrollbar(main_frame, orient=tk.HORIZONTAL)
+ h_scrollbar.pack(side=tk.BOTTOM, fill=tk.X)
+
+ v_scrollbar = ttk.Scrollbar(main_frame, orient=tk.VERTICAL)
+ v_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+
+ listbox = tk.Listbox(
+ main_frame,
+ selectmode=tk.EXTENDED,
+ yscrollcommand=v_scrollbar.set,
+ xscrollcommand=h_scrollbar.set,
+ width=120,
+ font=('Courier', 10) # Fixed-width font for better alignment
+ )
+ listbox.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+
+ v_scrollbar.config(command=listbox.yview)
+ h_scrollbar.config(command=listbox.xview)
+
+ # Populate listbox
+ status_icons = {
+ 'completed': '✅',
+ 'failed': '❌',
+ 'qa_failed': '❌',
+ 'file_missing': '⚠️',
+ 'in_progress': '🔄',
+ 'not_translated': '❌',
+ 'unknown': '❓'
+ }
+
+ status_labels = {
+ 'completed': 'Completed',
+ 'failed': 'Failed',
+ 'qa_failed': 'QA Failed',
+ 'file_missing': 'File Missing',
+ 'in_progress': 'In Progress',
+ 'not_translated': 'Not Translated',
+ 'unknown': 'Unknown'
+ }
+
+ for info in chapter_display_info:
+ chapter_num = info['num']
+ status = info['status']
+ output_file = info['output_file']
+ icon = status_icons.get(status, '❓')
+ status_label = status_labels.get(status, status)
+
+ # Format display with OPF info if available
+ if 'opf_position' in info:
+ # OPF-based display
+ original_file = info.get('original_filename', '')
+ opf_pos = info['opf_position'] + 1 # 1-based for display
+
+ # Format: [OPF Position] Chapter Number | Status | Original File -> Response File
+ if isinstance(chapter_num, float) and chapter_num.is_integer():
+ display = f"[{opf_pos:03d}] Ch.{int(chapter_num):03d} | {icon} {status_label:15s} | {original_file:30s} -> {output_file}"
+ else:
+ display = f"[{opf_pos:03d}] Ch.{chapter_num:03d} | {icon} {status_label:15s} | {original_file:30s} -> {output_file}"
+ else:
+ # Original format
+ if isinstance(chapter_num, float) and chapter_num.is_integer():
+ display = f"Chapter {int(chapter_num):03d} | {icon} {status_label:15s} | {output_file}"
+ elif isinstance(chapter_num, float):
+ display = f"Chapter {chapter_num:06.1f} | {icon} {status_label:15s} | {output_file}"
+ else:
+ display = f"Chapter {chapter_num:03d} | {icon} {status_label:15s} | {output_file}"
+
+ if info.get('duplicate_count', 1) > 1:
+ display += f" | ({info['duplicate_count']} entries)"
+
+ listbox.insert(tk.END, display)
+
+ # Color code based on status
+ if status == 'completed':
+ listbox.itemconfig(tk.END, fg='green')
+ elif status in ['failed', 'qa_failed', 'not_translated']:
+ listbox.itemconfig(tk.END, fg='red')
+ elif status == 'file_missing':
+ listbox.itemconfig(tk.END, fg='purple')
+ elif status == 'in_progress':
+ listbox.itemconfig(tk.END, fg='orange')
+
+ # Selection count label
+ selection_count_label = tk.Label(container, text="Selected: 0",
+ font=('Arial', 10 if not tab_frame else 9))
+ selection_count_label.pack(pady=(5, 10) if not tab_frame else 2)
+
+ def update_selection_count(*args):
+ count = len(listbox.curselection())
+ selection_count_label.config(text=f"Selected: {count}")
+
+ listbox.bind('<>', update_selection_count)
+
+ # Return data structure for external access
+ result = {
+ 'file_path': file_path,
+ 'output_dir': output_dir,
+ 'progress_file': progress_file,
+ 'prog': prog,
+ 'spine_chapters': spine_chapters,
+ 'opf_chapter_order': opf_chapter_order,
+ 'chapter_display_info': chapter_display_info,
+ 'listbox': listbox,
+ 'selection_count_label': selection_count_label,
+ 'dialog': dialog,
+ 'container': container
+ }
+
+ # If standalone (no parent), add buttons
+ if not parent_dialog or tab_frame:
+ self._add_retranslation_buttons_opf(result)
+
+ return result
+
+
+ def _add_retranslation_buttons_opf(self, data, button_frame=None):
+ """Add the standard button set for retranslation dialogs with OPF support"""
+
+ if not button_frame:
+ button_frame = tk.Frame(data['container'])
+ button_frame.pack(pady=10)
+
+ # Configure column weights
+ for i in range(5):
+ button_frame.columnconfigure(i, weight=1)
+
+ # Helper functions that work with the data dict
+ def select_all():
+ data['listbox'].select_set(0, tk.END)
+ data['selection_count_label'].config(text=f"Selected: {data['listbox'].size()}")
+
+ def clear_selection():
+ data['listbox'].select_clear(0, tk.END)
+ data['selection_count_label'].config(text="Selected: 0")
+
+ def select_status(status_to_select):
+ data['listbox'].select_clear(0, tk.END)
+ for idx, info in enumerate(data['chapter_display_info']):
+ if status_to_select == 'failed':
+ if info['status'] in ['failed', 'qa_failed']:
+ data['listbox'].select_set(idx)
+ elif status_to_select == 'missing':
+ if info['status'] in ['not_translated', 'file_missing']:
+ data['listbox'].select_set(idx)
+ else:
+ if info['status'] == status_to_select:
+ data['listbox'].select_set(idx)
+ count = len(data['listbox'].curselection())
+ data['selection_count_label'].config(text=f"Selected: {count}")
+
+ def remove_qa_failed_mark():
+ selected = data['listbox'].curselection()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select at least one chapter.")
+ return
+
+ selected_chapters = [data['chapter_display_info'][i] for i in selected]
+ qa_failed_chapters = [ch for ch in selected_chapters if ch['status'] == 'qa_failed']
+
+ if not qa_failed_chapters:
+ messagebox.showwarning("No QA Failed Chapters",
+ "None of the selected chapters have 'qa_failed' status.")
+ return
+
+ count = len(qa_failed_chapters)
+ if not messagebox.askyesno("Confirm Remove QA Failed Mark",
+ f"Remove QA failed mark from {count} chapters?"):
+ return
+
+ # Remove marks
+ cleared_count = 0
+ for info in qa_failed_chapters:
+ # Find the actual numeric key in progress by matching output_file
+ target_output_file = info['output_file']
+ chapter_key = None
+
+ # Search through all chapters to find the one with matching output_file
+ for key, ch_info in data['prog']["chapters"].items():
+ if ch_info.get('output_file') == target_output_file:
+ chapter_key = key
+ break
+
+ # Update the chapter status if we found the key
+ if chapter_key and chapter_key in data['prog']["chapters"]:
+ print(f"Updating chapter key {chapter_key} (output file: {target_output_file})")
+ data['prog']["chapters"][chapter_key]["status"] = "completed"
+
+ # Remove all QA-related fields
+ fields_to_remove = ["qa_issues", "qa_timestamp", "qa_issues_found", "duplicate_confidence"]
+ for field in fields_to_remove:
+ if field in data['prog']["chapters"][chapter_key]:
+ del data['prog']["chapters"][chapter_key][field]
+
+ cleared_count += 1
+ else:
+ print(f"WARNING: Could not find chapter key for output file: {target_output_file}")
+
+ # Save the updated progress
+ with open(data['progress_file'], 'w', encoding='utf-8') as f:
+ json.dump(data['prog'], f, ensure_ascii=False, indent=2)
+
+ messagebox.showinfo("Success", f"Removed QA failed mark from {cleared_count} chapters.")
+ if data.get('dialog'):
+ data['dialog'].destroy()
+
+ def retranslate_selected():
+ selected = data['listbox'].curselection()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select at least one chapter.")
+ return
+
+ selected_chapters = [data['chapter_display_info'][i] for i in selected]
+
+ # Count different types
+ missing_count = sum(1 for ch in selected_chapters if ch['status'] == 'not_translated')
+ existing_count = sum(1 for ch in selected_chapters if ch['status'] != 'not_translated')
+
+ count = len(selected)
+ if count > 10:
+ if missing_count > 0 and existing_count > 0:
+ confirm_msg = f"This will:\n• Mark {missing_count} missing chapters for translation\n• Delete and retranslate {existing_count} existing chapters\n\nTotal: {count} chapters\n\nContinue?"
+ elif missing_count > 0:
+ confirm_msg = f"This will mark {missing_count} missing chapters for translation.\n\nContinue?"
+ else:
+ confirm_msg = f"This will delete {existing_count} translated chapters and mark them for retranslation.\n\nContinue?"
+ else:
+ chapters = [f"Ch.{ch['num']}" for ch in selected_chapters]
+ confirm_msg = f"This will process:\n\n{', '.join(chapters)}\n\n"
+ if missing_count > 0:
+ confirm_msg += f"• {missing_count} missing chapters will be marked for translation\n"
+ if existing_count > 0:
+ confirm_msg += f"• {existing_count} existing chapters will be deleted and retranslated\n"
+ confirm_msg += "\nContinue?"
+
+ if not messagebox.askyesno("Confirm Retranslation", confirm_msg):
+ return
+
+ # Process chapters - DELETE FILES AND UPDATE PROGRESS
+ deleted_count = 0
+ marked_count = 0
+ status_reset_count = 0
+ progress_updated = False
+
+ for ch_info in selected_chapters:
+ output_file = ch_info['output_file']
+
+ if ch_info['status'] != 'not_translated':
+ # Delete existing file
+ if output_file:
+ output_path = os.path.join(data['output_dir'], output_file)
+ try:
+ if os.path.exists(output_path):
+ os.remove(output_path)
+ deleted_count += 1
+ print(f"Deleted: {output_path}")
+ except Exception as e:
+ print(f"Failed to delete {output_path}: {e}")
+
+ # Reset status for any completed or qa_failed chapters
+ if ch_info['status'] in ['completed', 'qa_failed']:
+ target_output_file = ch_info['output_file']
+ chapter_key = None
+
+ # Search through all chapters to find the one with matching output_file
+ for key, ch_data in data['prog']["chapters"].items():
+ if ch_data.get('output_file') == target_output_file:
+ chapter_key = key
+ break
+
+ # Update the chapter status if we found the key
+ if chapter_key and chapter_key in data['prog']["chapters"]:
+ old_status = ch_info['status']
+ print(f"Resetting {old_status} status to pending for chapter key {chapter_key} (output file: {target_output_file})")
+
+ # Reset status to pending for retranslation
+ data['prog']["chapters"][chapter_key]["status"] = "pending"
+
+ # Remove completion-related fields if they exist
+ fields_to_remove = []
+ if old_status == 'qa_failed':
+ # Remove QA-related fields for qa_failed chapters
+ fields_to_remove = ["qa_issues", "qa_timestamp", "qa_issues_found", "duplicate_confidence"]
+ elif old_status == 'completed':
+ # Remove completion-related fields if any exist for completed chapters
+ fields_to_remove = ["completion_timestamp", "final_word_count", "translation_quality_score"]
+
+ for field in fields_to_remove:
+ if field in data['prog']["chapters"][chapter_key]:
+ del data['prog']["chapters"][chapter_key][field]
+
+ status_reset_count += 1
+ progress_updated = True
+ else:
+ print(f"WARNING: Could not find chapter key for {old_status} output file: {target_output_file}")
+ else:
+ # Just marking for translation (no file to delete)
+ marked_count += 1
+
+ # Save the updated progress if we made changes
+ if progress_updated:
+ try:
+ with open(data['progress_file'], 'w', encoding='utf-8') as f:
+ json.dump(data['prog'], f, ensure_ascii=False, indent=2)
+ print(f"Updated progress tracking file - reset {status_reset_count} chapter statuses to pending")
+ except Exception as e:
+ print(f"Failed to update progress file: {e}")
+
+ # Build success message
+ success_parts = []
+ if deleted_count > 0:
+ success_parts.append(f"Deleted {deleted_count} files")
+ if marked_count > 0:
+ success_parts.append(f"marked {marked_count} missing chapters for translation")
+ if status_reset_count > 0:
+ success_parts.append(f"reset {status_reset_count} chapter statuses to pending")
+
+ if success_parts:
+ success_msg = "Successfully " + ", ".join(success_parts) + "."
+ if deleted_count > 0 or marked_count > 0:
+ success_msg += f"\n\nTotal {len(selected)} chapters ready for translation."
+ messagebox.showinfo("Success", success_msg)
+ else:
+ messagebox.showinfo("Info", "No changes made.")
+
+ if data.get('dialog'):
+ data['dialog'].destroy()
+
+ # Add buttons - First row
+ tb.Button(button_frame, text="Select All", command=select_all,
+ bootstyle="info").grid(row=0, column=0, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Clear", command=clear_selection,
+ bootstyle="secondary").grid(row=0, column=1, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Select Completed", command=lambda: select_status('completed'),
+ bootstyle="success").grid(row=0, column=2, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Select Missing", command=lambda: select_status('missing'),
+ bootstyle="danger").grid(row=0, column=3, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Select Failed", command=lambda: select_status('failed'),
+ bootstyle="warning").grid(row=0, column=4, padx=5, pady=5, sticky="ew")
+
+ # Second row
+ tb.Button(button_frame, text="Retranslate Selected", command=retranslate_selected,
+ bootstyle="warning").grid(row=1, column=0, columnspan=2, padx=5, pady=10, sticky="ew")
+ tb.Button(button_frame, text="Remove QA Failed Mark", command=remove_qa_failed_mark,
+ bootstyle="success").grid(row=1, column=2, columnspan=1, padx=5, pady=10, sticky="ew")
+ tb.Button(button_frame, text="Cancel", command=lambda: data['dialog'].destroy() if data.get('dialog') else None,
+ bootstyle="secondary").grid(row=1, column=3, columnspan=2, padx=5, pady=10, sticky="ew")
+
+
+ def _force_retranslation_multiple_files(self):
+ """Handle force retranslation when multiple files are selected - now uses shared logic"""
+
+ # First, check if all selected files are images from the same folder
+ # This handles the case where folder selection results in individual file selections
+ if len(self.selected_files) > 1:
+ all_images = True
+ parent_dirs = set()
+
+ image_extensions = ('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp')
+
+ for file_path in self.selected_files:
+ if os.path.isfile(file_path) and file_path.lower().endswith(image_extensions):
+ parent_dirs.add(os.path.dirname(file_path))
+ else:
+ all_images = False
+ break
+
+ # If all files are images from the same directory, treat it as a folder selection
+ if all_images and len(parent_dirs) == 1:
+ folder_path = parent_dirs.pop()
+ print(f"[DEBUG] Detected {len(self.selected_files)} images from same folder: {folder_path}")
+ print(f"[DEBUG] Treating as folder selection")
+ self._force_retranslation_images_folder(folder_path)
+ return
+
+ # Otherwise, continue with normal categorization
+ epub_files = []
+ text_files = []
+ image_files = []
+ folders = []
+
+ image_extensions = ('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp')
+
+ for file_path in self.selected_files:
+ if os.path.isdir(file_path):
+ folders.append(file_path)
+ elif file_path.lower().endswith('.epub'):
+ epub_files.append(file_path)
+ elif file_path.lower().endswith('.txt'):
+ text_files.append(file_path)
+ elif file_path.lower().endswith(image_extensions):
+ image_files.append(file_path)
+
+ # Build summary
+ summary_parts = []
+ if epub_files:
+ summary_parts.append(f"{len(epub_files)} EPUB file(s)")
+ if text_files:
+ summary_parts.append(f"{len(text_files)} text file(s)")
+ if image_files:
+ summary_parts.append(f"{len(image_files)} image file(s)")
+ if folders:
+ summary_parts.append(f"{len(folders)} folder(s)")
+
+ if not summary_parts:
+ messagebox.showinfo("Info", "No valid files selected.")
+ return
+
+ # Create main dialog
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Force Retranslation - Multiple Files",
+ width=950,
+ height=700
+ )
+
+ # Summary label
+ tk.Label(dialog, text=f"Selected: {', '.join(summary_parts)}",
+ font=('Arial', 12, 'bold')).pack(pady=10)
+
+ # Create notebook
+ notebook = ttk.Notebook(dialog)
+ notebook.pack(fill=tk.BOTH, expand=True, padx=10, pady=5)
+
+ # Track all tab data
+ tab_data = []
+ tabs_created = False
+
+ # Create tabs for EPUB/text files using shared logic
+ for file_path in epub_files + text_files:
+ file_base = os.path.splitext(os.path.basename(file_path))[0]
+
+ # Quick check if output exists
+ if not os.path.exists(file_base):
+ continue
+
+ # Create tab
+ tab_frame = tk.Frame(notebook)
+ tab_name = file_base[:20] + "..." if len(file_base) > 20 else file_base
+ notebook.add(tab_frame, text=tab_name)
+ tabs_created = True
+
+ # Use shared logic to populate the tab
+ tab_result = self._force_retranslation_epub_or_text(
+ file_path,
+ parent_dialog=dialog,
+ tab_frame=tab_frame
+ )
+
+ if tab_result:
+ tab_data.append(tab_result)
+
+ # Create tabs for image folders (keeping existing logic for now)
+ for folder_path in folders:
+ folder_result = self._create_image_folder_tab(
+ folder_path,
+ notebook,
+ dialog
+ )
+ if folder_result:
+ tab_data.append(folder_result)
+ tabs_created = True
+
+ # If only individual image files selected and no tabs created yet
+ if image_files and not tabs_created:
+ # Create a single tab for all individual images
+ image_tab_result = self._create_individual_images_tab(
+ image_files,
+ notebook,
+ dialog
+ )
+ if image_tab_result:
+ tab_data.append(image_tab_result)
+ tabs_created = True
+
+ # If no tabs were created, show error
+ if not tabs_created:
+ messagebox.showinfo("Info",
+ "No translation output found for any of the selected files.\n\n"
+ "Make sure the output folders exist in your script directory.")
+ dialog.destroy()
+ return
+
+ # Add unified button bar that works across all tabs
+ self._add_multi_file_buttons(dialog, notebook, tab_data)
+
+ def _add_multi_file_buttons(self, dialog, notebook, tab_data):
+ """Add a simple cancel button at the bottom of the dialog"""
+ button_frame = tk.Frame(dialog)
+ button_frame.pack(side=tk.BOTTOM, fill=tk.X, padx=10, pady=10)
+
+ tb.Button(button_frame, text="Close All", command=dialog.destroy,
+ bootstyle="secondary").pack(side=tk.RIGHT, padx=5)
+
+ def _create_individual_images_tab(self, image_files, notebook, parent_dialog):
+ """Create a tab for individual image files"""
+ # Create tab
+ tab_frame = tk.Frame(notebook)
+ notebook.add(tab_frame, text="Individual Images")
+
+ # Instructions
+ tk.Label(tab_frame, text=f"Selected {len(image_files)} individual image(s):",
+ font=('Arial', 11)).pack(pady=5)
+
+ # Main frame
+ main_frame = tk.Frame(tab_frame)
+ main_frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5)
+
+ # Scrollbars and listbox
+ h_scrollbar = ttk.Scrollbar(main_frame, orient=tk.HORIZONTAL)
+ h_scrollbar.pack(side=tk.BOTTOM, fill=tk.X)
+
+ v_scrollbar = ttk.Scrollbar(main_frame, orient=tk.VERTICAL)
+ v_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+
+ listbox = tk.Listbox(
+ main_frame,
+ selectmode=tk.EXTENDED,
+ yscrollcommand=v_scrollbar.set,
+ xscrollcommand=h_scrollbar.set,
+ width=100
+ )
+ listbox.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+
+ v_scrollbar.config(command=listbox.yview)
+ h_scrollbar.config(command=listbox.xview)
+
+ # File info
+ file_info = []
+ script_dir = os.getcwd()
+
+ # Check each image for translations
+ for img_path in sorted(image_files):
+ img_name = os.path.basename(img_path)
+ base_name = os.path.splitext(img_name)[0]
+
+ # Look for translations in various possible locations
+ found_translations = []
+
+ # Check in script directory with base name
+ possible_dirs = [
+ os.path.join(script_dir, base_name),
+ os.path.join(script_dir, f"{base_name}_translated"),
+ base_name,
+ f"{base_name}_translated"
+ ]
+
+ for output_dir in possible_dirs:
+ if os.path.exists(output_dir) and os.path.isdir(output_dir):
+ # Look for HTML files
+ for file in os.listdir(output_dir):
+ if file.lower().endswith(('.html', '.xhtml', '.htm')) and base_name in file:
+ found_translations.append((output_dir, file))
+
+ if found_translations:
+ for output_dir, html_file in found_translations:
+ display = f"📄 {img_name} → {html_file} | ✅ Translated"
+ listbox.insert(tk.END, display)
+
+ file_info.append({
+ 'type': 'translated',
+ 'source_image': img_path,
+ 'output_dir': output_dir,
+ 'file': html_file,
+ 'path': os.path.join(output_dir, html_file)
+ })
+ else:
+ display = f"🖼️ {img_name} | ❌ No translation found"
+ listbox.insert(tk.END, display)
+
+ # Selection count
+ selection_count_label = tk.Label(tab_frame, text="Selected: 0", font=('Arial', 9))
+ selection_count_label.pack(pady=2)
+
+ def update_selection_count(*args):
+ count = len(listbox.curselection())
+ selection_count_label.config(text=f"Selected: {count}")
+
+ listbox.bind('<>', update_selection_count)
+
+ return {
+ 'type': 'individual_images',
+ 'listbox': listbox,
+ 'file_info': file_info,
+ 'selection_count_label': selection_count_label
+ }
+
+
+ def _create_image_folder_tab(self, folder_path, notebook, parent_dialog):
+ """Create a tab for image folder retranslation"""
+ folder_name = os.path.basename(folder_path)
+ output_dir = f"{folder_name}_translated"
+
+ if not os.path.exists(output_dir):
+ return None
+
+ # Create tab
+ tab_frame = tk.Frame(notebook)
+ tab_name = "📁 " + (folder_name[:17] + "..." if len(folder_name) > 17 else folder_name)
+ notebook.add(tab_frame, text=tab_name)
+
+ # Instructions
+ tk.Label(tab_frame, text="Select images to retranslate:", font=('Arial', 11)).pack(pady=5)
+
+ # Main frame
+ main_frame = tk.Frame(tab_frame)
+ main_frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5)
+
+ # Scrollbars and listbox
+ h_scrollbar = ttk.Scrollbar(main_frame, orient=tk.HORIZONTAL)
+ h_scrollbar.pack(side=tk.BOTTOM, fill=tk.X)
+
+ v_scrollbar = ttk.Scrollbar(main_frame, orient=tk.VERTICAL)
+ v_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+
+ listbox = tk.Listbox(
+ main_frame,
+ selectmode=tk.EXTENDED,
+ yscrollcommand=v_scrollbar.set,
+ xscrollcommand=h_scrollbar.set,
+ width=100
+ )
+ listbox.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+
+ v_scrollbar.config(command=listbox.yview)
+ h_scrollbar.config(command=listbox.xview)
+
+ # Find files
+ file_info = []
+
+ # Add HTML files
+ for file in os.listdir(output_dir):
+ if file.startswith('response_'):
+ # Allow response_{index}_{name}.html and compound extensions like .html.xhtml
+ match = re.match(r'^response_(\d+)_([^\.]*)\.(?:html?|xhtml|htm)(?:\.xhtml)?$', file, re.IGNORECASE)
+ if match:
+ index = match.group(1)
+ base_name = match.group(2)
+ display = f"📄 Image {index} | {base_name} | ✅ Translated"
+ else:
+ display = f"📄 {file} | ✅ Translated"
+
+ listbox.insert(tk.END, display)
+ file_info.append({
+ 'type': 'translated',
+ 'file': file,
+ 'path': os.path.join(output_dir, file)
+ })
+
+ # Add cover images
+ images_dir = os.path.join(output_dir, "images")
+ if os.path.exists(images_dir):
+ for file in sorted(os.listdir(images_dir)):
+ if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp')):
+ display = f"🖼️ Cover | {file} | ⏭️ Skipped"
+ listbox.insert(tk.END, display)
+ file_info.append({
+ 'type': 'cover',
+ 'file': file,
+ 'path': os.path.join(images_dir, file)
+ })
+
+ # Selection count
+ selection_count_label = tk.Label(tab_frame, text="Selected: 0", font=('Arial', 9))
+ selection_count_label.pack(pady=2)
+
+ def update_selection_count(*args):
+ count = len(listbox.curselection())
+ selection_count_label.config(text=f"Selected: {count}")
+
+ listbox.bind('<>', update_selection_count)
+
+ return {
+ 'type': 'image_folder',
+ 'folder_path': folder_path,
+ 'output_dir': output_dir,
+ 'listbox': listbox,
+ 'file_info': file_info,
+ 'selection_count_label': selection_count_label
+ }
+
+
+ def _force_retranslation_images_folder(self, folder_path):
+ """Handle force retranslation for image folders"""
+ folder_name = os.path.basename(folder_path)
+
+ # Look for output folder in the SCRIPT'S directory, not relative to the selected folder
+ script_dir = os.getcwd() # Current working directory where the script is running
+
+ # Check multiple possible output folder patterns IN THE SCRIPT DIRECTORY
+ possible_output_dirs = [
+ os.path.join(script_dir, folder_name), # Script dir + folder name
+ os.path.join(script_dir, f"{folder_name}_translated"), # Script dir + folder_translated
+ folder_name, # Just the folder name in current directory
+ f"{folder_name}_translated", # folder_translated in current directory
+ ]
+
+ output_dir = None
+ for possible_dir in possible_output_dirs:
+ print(f"Checking: {possible_dir}")
+ if os.path.exists(possible_dir):
+ # Check if it has translation_progress.json or HTML files
+ if os.path.exists(os.path.join(possible_dir, "translation_progress.json")):
+ output_dir = possible_dir
+ print(f"Found output directory with progress tracker: {output_dir}")
+ break
+ # Check if it has any HTML files
+ elif os.path.isdir(possible_dir):
+ try:
+ files = os.listdir(possible_dir)
+ if any(f.lower().endswith(('.html', '.xhtml', '.htm')) for f in files):
+ output_dir = possible_dir
+ print(f"Found output directory with HTML files: {output_dir}")
+ break
+ except:
+ pass
+
+ if not output_dir:
+ messagebox.showinfo("Info",
+ f"No translation output found for '{folder_name}'.\n\n"
+ f"Selected folder: {folder_path}\n"
+ f"Script directory: {script_dir}\n\n"
+ f"Checked locations:\n" + "\n".join(f"- {d}" for d in possible_output_dirs))
+ return
+
+ print(f"Using output directory: {output_dir}")
+
+ # Check for progress tracking file
+ progress_file = os.path.join(output_dir, "translation_progress.json")
+ has_progress_tracking = os.path.exists(progress_file)
+
+ print(f"Progress tracking: {has_progress_tracking} at {progress_file}")
+
+ # Find all HTML files in the output directory
+ html_files = []
+ image_files = []
+ progress_data = None
+
+ if has_progress_tracking:
+ # Load progress data for image translations
+ try:
+ with open(progress_file, 'r', encoding='utf-8') as f:
+ progress_data = json.load(f)
+ print(f"Loaded progress data with {len(progress_data)} entries")
+
+ # Extract files from progress data
+ # The structure appears to use hash keys at the root level
+ for key, value in progress_data.items():
+ if isinstance(value, dict) and 'output_file' in value:
+ output_file = value['output_file']
+ # Handle both forward and backslashes in paths
+ output_file = output_file.replace('\\', '/')
+ if '/' in output_file:
+ output_file = os.path.basename(output_file)
+ html_files.append(output_file)
+ print(f"Found tracked file: {output_file}")
+ except Exception as e:
+ print(f"Error loading progress file: {e}")
+ import traceback
+ traceback.print_exc()
+ has_progress_tracking = False
+
+ # Also scan directory for any HTML files not in progress
+ try:
+ for file in os.listdir(output_dir):
+ file_path = os.path.join(output_dir, file)
+ if os.path.isfile(file_path) and file.endswith('.html') and file not in html_files:
+ html_files.append(file)
+ print(f"Found untracked HTML file: {file}")
+ except Exception as e:
+ print(f"Error scanning directory: {e}")
+
+ # Check for images subdirectory (cover images)
+ images_dir = os.path.join(output_dir, "images")
+ if os.path.exists(images_dir):
+ try:
+ for file in os.listdir(images_dir):
+ if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp')):
+ image_files.append(file)
+ except Exception as e:
+ print(f"Error scanning images directory: {e}")
+
+ print(f"Total files found: {len(html_files)} HTML, {len(image_files)} images")
+
+ if not html_files and not image_files:
+ messagebox.showinfo("Info",
+ f"No translated files found in: {output_dir}\n\n"
+ f"Progress tracking: {'Yes' if has_progress_tracking else 'No'}")
+ return
+
+ # Create dialog
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Force Retranslation - Images",
+ width=800,
+ height=600
+ )
+
+ # Add instructions with more detail
+ instruction_text = f"Output folder: {output_dir}\n"
+ instruction_text += f"Found {len(html_files)} translated images and {len(image_files)} cover images"
+ if has_progress_tracking:
+ instruction_text += " (with progress tracking)"
+ tk.Label(dialog, text=instruction_text, font=('Arial', 11), justify=tk.LEFT).pack(pady=10)
+
+ # Create main frame for listbox and scrollbars
+ main_frame = tk.Frame(dialog)
+ main_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=5)
+
+ # Create scrollbars
+ h_scrollbar = ttk.Scrollbar(main_frame, orient=tk.HORIZONTAL)
+ h_scrollbar.pack(side=tk.BOTTOM, fill=tk.X)
+
+ v_scrollbar = ttk.Scrollbar(main_frame, orient=tk.VERTICAL)
+ v_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+
+ # Create listbox
+ listbox = tk.Listbox(
+ main_frame,
+ selectmode=tk.EXTENDED,
+ yscrollcommand=v_scrollbar.set,
+ xscrollcommand=h_scrollbar.set,
+ width=100
+ )
+ listbox.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+
+ # Configure scrollbars
+ v_scrollbar.config(command=listbox.yview)
+ h_scrollbar.config(command=listbox.xview)
+
+ # Keep track of file info
+ file_info = []
+
+ # Add translated HTML files
+ for html_file in sorted(set(html_files)): # Use set to avoid duplicates
+ # Extract original image name from HTML filename
+ # Expected format: response_001_imagename.html
+ match = re.match(r'response_(\d+)_(.+)\.html', html_file)
+ if match:
+ index = match.group(1)
+ base_name = match.group(2)
+ display = f"📄 Image {index} | {base_name} | ✅ Translated"
+ else:
+ display = f"📄 {html_file} | ✅ Translated"
+
+ listbox.insert(tk.END, display)
+
+ # Find the hash key for this file if progress tracking exists
+ hash_key = None
+ if progress_data:
+ for key, value in progress_data.items():
+ if isinstance(value, dict) and 'output_file' in value:
+ if html_file in value['output_file']:
+ hash_key = key
+ break
+
+ file_info.append({
+ 'type': 'translated',
+ 'file': html_file,
+ 'path': os.path.join(output_dir, html_file),
+ 'hash_key': hash_key,
+ 'output_dir': output_dir # Store for later use
+ })
+
+ # Add cover images
+ for img_file in sorted(image_files):
+ display = f"🖼️ Cover | {img_file} | ⏭️ Skipped (cover)"
+ listbox.insert(tk.END, display)
+ file_info.append({
+ 'type': 'cover',
+ 'file': img_file,
+ 'path': os.path.join(images_dir, img_file),
+ 'hash_key': None,
+ 'output_dir': output_dir
+ })
+
+ # Selection count label
+ selection_count_label = tk.Label(dialog, text="Selected: 0", font=('Arial', 10))
+ selection_count_label.pack(pady=(5, 10))
+
+ def update_selection_count(*args):
+ count = len(listbox.curselection())
+ selection_count_label.config(text=f"Selected: {count}")
+
+ listbox.bind('<>', update_selection_count)
+
+ # Button frame
+ button_frame = tk.Frame(dialog)
+ button_frame.pack(pady=10)
+
+ # Configure grid columns
+ for i in range(4):
+ button_frame.columnconfigure(i, weight=1)
+
+ def select_all():
+ listbox.select_set(0, tk.END)
+ update_selection_count()
+
+ def clear_selection():
+ listbox.select_clear(0, tk.END)
+ update_selection_count()
+
+ def select_translated():
+ listbox.select_clear(0, tk.END)
+ for idx, info in enumerate(file_info):
+ if info['type'] == 'translated':
+ listbox.select_set(idx)
+ update_selection_count()
+
+ def mark_as_skipped():
+ """Move selected images to the images folder to be skipped"""
+ selected = listbox.curselection()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select at least one image to mark as skipped.")
+ return
+
+ # Get all selected items
+ selected_items = [(i, file_info[i]) for i in selected]
+
+ # Filter out items already in images folder (covers)
+ items_to_move = [(i, item) for i, item in selected_items if item['type'] != 'cover']
+
+ if not items_to_move:
+ messagebox.showinfo("Info", "Selected items are already in the images folder (skipped).")
+ return
+
+ count = len(items_to_move)
+ if not messagebox.askyesno("Confirm Mark as Skipped",
+ f"Move {count} translated image(s) to the images folder?\n\n"
+ "This will:\n"
+ "• Delete the translated HTML files\n"
+ "• Copy source images to the images folder\n"
+ "• Skip these images in future translations"):
+ return
+
+ # Create images directory if it doesn't exist
+ images_dir = os.path.join(output_dir, "images")
+ os.makedirs(images_dir, exist_ok=True)
+
+ moved_count = 0
+ failed_count = 0
+
+ for idx, item in items_to_move:
+ try:
+ # Extract the original image name from the HTML filename
+ # Expected format: response_001_imagename.html (also accept compound extensions)
+ html_file = item['file']
+ match = re.match(r'^response_\d+_([^\.]*)\.(?:html?|xhtml|htm)(?:\.xhtml)?$', html_file, re.IGNORECASE)
+
+ if match:
+ base_name = match.group(1)
+ # Try to find the original image with common extensions
+ original_found = False
+
+ for ext in ['.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp']:
+ # Check in the parent folder (where source images are)
+ possible_source = os.path.join(folder_path, base_name + ext)
+ if os.path.exists(possible_source):
+ # Copy to images folder
+ dest_path = os.path.join(images_dir, base_name + ext)
+ if not os.path.exists(dest_path):
+ import shutil
+ shutil.copy2(possible_source, dest_path)
+ print(f"Copied {base_name + ext} to images folder")
+ original_found = True
+ break
+
+ if not original_found:
+ print(f"Warning: Could not find original image for {html_file}")
+
+ # Delete the HTML translation file
+ if os.path.exists(item['path']):
+ os.remove(item['path'])
+ print(f"Deleted translation: {item['path']}")
+
+ # Remove from progress tracking if applicable
+ if progress_data and item.get('hash_key') and item['hash_key'] in progress_data:
+ del progress_data[item['hash_key']]
+
+ # Update the listbox display
+ display = f"🖼️ Skipped | {base_name if match else item['file']} | ⏭️ Moved to images folder"
+ listbox.delete(idx)
+ listbox.insert(idx, display)
+
+ # Update file_info
+ file_info[idx] = {
+ 'type': 'cover', # Treat as cover type since it's in images folder
+ 'file': base_name + ext if match and original_found else item['file'],
+ 'path': os.path.join(images_dir, base_name + ext if match and original_found else item['file']),
+ 'hash_key': None,
+ 'output_dir': output_dir
+ }
+
+ moved_count += 1
+
+ except Exception as e:
+ print(f"Failed to process {item['file']}: {e}")
+ failed_count += 1
+
+ # Save updated progress if modified
+ if progress_data:
+ try:
+ with open(progress_file, 'w', encoding='utf-8') as f:
+ json.dump(progress_data, f, ensure_ascii=False, indent=2)
+ print(f"Updated progress tracking file")
+ except Exception as e:
+ print(f"Failed to update progress file: {e}")
+
+ # Update selection count
+ update_selection_count()
+
+ # Show result
+ if failed_count > 0:
+ messagebox.showwarning("Partial Success",
+ f"Moved {moved_count} image(s) to be skipped.\n"
+ f"Failed to process {failed_count} item(s).")
+ else:
+ messagebox.showinfo("Success",
+ f"Moved {moved_count} image(s) to the images folder.\n"
+ "They will be skipped in future translations.")
+
+ def retranslate_selected():
+ selected = listbox.curselection()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select at least one file.")
+ return
+
+ # Count types
+ translated_count = sum(1 for i in selected if file_info[i]['type'] == 'translated')
+ cover_count = sum(1 for i in selected if file_info[i]['type'] == 'cover')
+
+ # Build confirmation message
+ msg_parts = []
+ if translated_count > 0:
+ msg_parts.append(f"{translated_count} translated image(s)")
+ if cover_count > 0:
+ msg_parts.append(f"{cover_count} cover image(s)")
+
+ confirm_msg = f"This will delete {' and '.join(msg_parts)}.\n\nContinue?"
+
+ if not messagebox.askyesno("Confirm Deletion", confirm_msg):
+ return
+
+ # Delete selected files
+ deleted_count = 0
+ progress_updated = False
+
+ for idx in selected:
+ info = file_info[idx]
+ try:
+ if os.path.exists(info['path']):
+ os.remove(info['path'])
+ deleted_count += 1
+ print(f"Deleted: {info['path']}")
+
+ # Remove from progress tracking if applicable
+ if progress_data and info['hash_key'] and info['hash_key'] in progress_data:
+ del progress_data[info['hash_key']]
+ progress_updated = True
+
+ except Exception as e:
+ print(f"Failed to delete {info['path']}: {e}")
+
+ # Save updated progress if modified
+ if progress_updated and progress_data:
+ try:
+ with open(progress_file, 'w', encoding='utf-8') as f:
+ json.dump(progress_data, f, ensure_ascii=False, indent=2)
+ print(f"Updated progress tracking file")
+ except Exception as e:
+ print(f"Failed to update progress file: {e}")
+
+ messagebox.showinfo("Success",
+ f"Deleted {deleted_count} file(s).\n\n"
+ "They will be retranslated on the next run.")
+
+ dialog.destroy()
+
+ # Add buttons in grid layout (similar to EPUB/text retranslation)
+ # Row 0: Selection buttons
+ tb.Button(button_frame, text="Select All", command=select_all,
+ bootstyle="info").grid(row=0, column=0, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Clear Selection", command=clear_selection,
+ bootstyle="secondary").grid(row=0, column=1, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Select Translated", command=select_translated,
+ bootstyle="success").grid(row=0, column=2, padx=5, pady=5, sticky="ew")
+ tb.Button(button_frame, text="Mark as Skipped", command=mark_as_skipped,
+ bootstyle="warning").grid(row=0, column=3, padx=5, pady=5, sticky="ew")
+
+ # Row 1: Action buttons
+ tb.Button(button_frame, text="Delete Selected", command=retranslate_selected,
+ bootstyle="danger").grid(row=1, column=0, columnspan=2, padx=5, pady=10, sticky="ew")
+ tb.Button(button_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary").grid(row=1, column=2, columnspan=2, padx=5, pady=10, sticky="ew")
+
+ def glossary_manager(self):
+ """Open comprehensive glossary management dialog"""
+ # Create scrollable dialog (stays hidden)
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Glossary Manager",
+ width=0, # Will be auto-sized
+ height=None,
+ max_width_ratio=0.9,
+ max_height_ratio=0.85
+ )
+
+ # Create notebook for tabs
+ notebook = ttk.Notebook(scrollable_frame)
+ notebook.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ # Create and add tabs
+ tabs = [
+ ("Manual Glossary Extraction", self._setup_manual_glossary_tab),
+ ("Automatic Glossary Generation", self._setup_auto_glossary_tab),
+ ("Glossary Editor", self._setup_glossary_editor_tab)
+ ]
+
+ for tab_name, setup_method in tabs:
+ frame = ttk.Frame(notebook)
+ notebook.add(frame, text=tab_name)
+ setup_method(frame)
+
+ # Dialog Controls
+ control_frame = tk.Frame(dialog)
+ control_frame.pack(fill=tk.X, padx=10, pady=10)
+
+ def save_glossary_settings():
+ try:
+ # Update prompts from text widgets
+ self.update_glossary_prompts()
+
+ # Save custom fields
+ self.config['custom_glossary_fields'] = self.custom_glossary_fields
+
+ # Update enabled status from checkboxes
+ if hasattr(self, 'type_enabled_vars'):
+ for type_name, var in self.type_enabled_vars.items():
+ if type_name in self.custom_entry_types:
+ self.custom_entry_types[type_name]['enabled'] = var.get()
+
+ # Save custom entry types
+ self.config['custom_entry_types'] = self.custom_entry_types
+
+ # Save all glossary-related settings
+ self.config['enable_auto_glossary'] = self.enable_auto_glossary_var.get()
+ self.config['append_glossary'] = self.append_glossary_var.get()
+ self.config['glossary_min_frequency'] = int(self.glossary_min_frequency_var.get())
+ self.config['glossary_max_names'] = int(self.glossary_max_names_var.get())
+ self.config['glossary_max_titles'] = int(self.glossary_max_titles_var.get())
+ self.config['glossary_batch_size'] = int(self.glossary_batch_size_var.get())
+ self.config['glossary_format_instructions'] = getattr(self, 'glossary_format_instructions', '')
+ self.config['glossary_max_text_size'] = self.glossary_max_text_size_var.get()
+ self.config['glossary_max_sentences'] = int(self.glossary_max_sentences_var.get())
+
+
+ # Honorifics and other settings
+ if hasattr(self, 'strip_honorifics_var'):
+ self.config['strip_honorifics'] = self.strip_honorifics_var.get()
+ if hasattr(self, 'disable_honorifics_var'):
+ self.config['glossary_disable_honorifics_filter'] = self.disable_honorifics_var.get()
+
+ # Save format preference
+ if hasattr(self, 'use_legacy_csv_var'):
+ self.config['glossary_use_legacy_csv'] = self.use_legacy_csv_var.get()
+
+ # Temperature and context limit
+ try:
+ self.config['manual_glossary_temperature'] = float(self.manual_temp_var.get())
+ self.config['manual_context_limit'] = int(self.manual_context_var.get())
+ except ValueError:
+ messagebox.showwarning("Invalid Input",
+ "Please enter valid numbers for temperature and context limit")
+ return
+
+ # Fuzzy matching threshold
+ self.config['glossary_fuzzy_threshold'] = self.fuzzy_threshold_var.get()
+
+ # Save prompts
+ self.config['manual_glossary_prompt'] = self.manual_glossary_prompt
+ self.config['auto_glossary_prompt'] = self.auto_glossary_prompt
+ self.config['append_glossary_prompt'] = self.append_glossary_prompt
+ self.config['glossary_translation_prompt'] = getattr(self, 'glossary_translation_prompt', '')
+
+ # Update environment variables for immediate use
+ os.environ['GLOSSARY_SYSTEM_PROMPT'] = self.manual_glossary_prompt
+ os.environ['AUTO_GLOSSARY_PROMPT'] = self.auto_glossary_prompt
+ os.environ['GLOSSARY_DISABLE_HONORIFICS_FILTER'] = '1' if self.disable_honorifics_var.get() else '0'
+ os.environ['GLOSSARY_STRIP_HONORIFICS'] = '1' if self.strip_honorifics_var.get() else '0'
+ os.environ['GLOSSARY_FUZZY_THRESHOLD'] = str(self.fuzzy_threshold_var.get())
+ os.environ['GLOSSARY_TRANSLATION_PROMPT'] = getattr(self, 'glossary_translation_prompt', '')
+ os.environ['GLOSSARY_FORMAT_INSTRUCTIONS'] = getattr(self, 'glossary_format_instructions', '')
+ os.environ['GLOSSARY_USE_LEGACY_CSV'] = '1' if self.use_legacy_csv_var.get() else '0'
+ os.environ['GLOSSARY_MAX_SENTENCES'] = str(self.glossary_max_sentences_var.get())
+
+ # Set custom entry types and fields as environment variables
+ os.environ['GLOSSARY_CUSTOM_ENTRY_TYPES'] = json.dumps(self.custom_entry_types)
+ if self.custom_glossary_fields:
+ os.environ['GLOSSARY_CUSTOM_FIELDS'] = json.dumps(self.custom_glossary_fields)
+
+ # Save config using the main save_config method to ensure encryption
+ self.save_config(show_message=False)
+
+ self.append_log("✅ Glossary settings saved successfully")
+
+ # Check if any types are enabled
+ enabled_types = [t for t, cfg in self.custom_entry_types.items() if cfg.get('enabled', True)]
+ if not enabled_types:
+ messagebox.showwarning("Warning", "No entry types selected! The glossary extraction will not find any entries.")
+ else:
+ self.append_log(f"📑 Enabled types: {', '.join(enabled_types)}")
+
+ messagebox.showinfo("Success", "Glossary settings saved!")
+ dialog.destroy()
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to save settings: {e}")
+ self.append_log(f"❌ Failed to save glossary settings: {e}")
+
+ # Create button container
+ button_container = tk.Frame(control_frame)
+ button_container.pack(expand=True)
+
+ # Add buttons
+ tb.Button(
+ button_container,
+ text="Save All Settings",
+ command=save_glossary_settings,
+ bootstyle="success",
+ width=20
+ ).pack(side=tk.LEFT, padx=5)
+
+ tb.Button(
+ button_container,
+ text="Cancel",
+ command=lambda: [dialog._cleanup_scrolling(), dialog.destroy()],
+ bootstyle="secondary",
+ width=20
+ ).pack(side=tk.LEFT, padx=5)
+
+ # Auto-resize and show
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.9, max_height_ratio=1.5)
+
+ dialog.protocol("WM_DELETE_WINDOW",
+ lambda: [dialog._cleanup_scrolling(), dialog.destroy()])
+
+ def _setup_manual_glossary_tab(self, parent):
+ """Setup manual glossary tab - simplified for new format"""
+ manual_container = tk.Frame(parent)
+ manual_container.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ # Type filtering section with custom types
+ type_filter_frame = tk.LabelFrame(manual_container, text="Entry Type Configuration", padx=10, pady=10)
+ type_filter_frame.pack(fill=tk.X, pady=(0, 10))
+
+ # Initialize custom entry types if not exists
+ if not hasattr(self, 'custom_entry_types'):
+ # Default types with their enabled status
+ self.custom_entry_types = self.config.get('custom_entry_types', {
+ 'character': {'enabled': True, 'has_gender': True},
+ 'term': {'enabled': True, 'has_gender': False}
+ })
+
+ # Main container with grid for better control
+ type_main_container = tk.Frame(type_filter_frame)
+ type_main_container.pack(fill=tk.X)
+ type_main_container.grid_columnconfigure(0, weight=3) # Left side gets 3/5 of space
+ type_main_container.grid_columnconfigure(1, weight=2) # Right side gets 2/5 of space
+
+ # Left side - type list with checkboxes
+ type_list_frame = tk.Frame(type_main_container)
+ type_list_frame.grid(row=0, column=0, sticky="nsew", padx=(0, 15))
+
+ tk.Label(type_list_frame, text="Active Entry Types:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W)
+
+ # Scrollable frame for type checkboxes
+ type_scroll_frame = tk.Frame(type_list_frame)
+ type_scroll_frame.pack(fill=tk.BOTH, expand=True, pady=(5, 0))
+
+ type_canvas = tk.Canvas(type_scroll_frame, height=150)
+ type_scrollbar = ttk.Scrollbar(type_scroll_frame, orient="vertical", command=type_canvas.yview)
+ self.type_checkbox_frame = tk.Frame(type_canvas)
+
+ type_canvas.configure(yscrollcommand=type_scrollbar.set)
+ type_canvas_window = type_canvas.create_window((0, 0), window=self.type_checkbox_frame, anchor="nw")
+
+ type_canvas.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+ type_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+
+ # Store checkbox variables
+ self.type_enabled_vars = {}
+
+ def update_type_checkboxes():
+ """Rebuild the checkbox list"""
+ # Clear existing checkboxes
+ for widget in self.type_checkbox_frame.winfo_children():
+ widget.destroy()
+
+ # Sort types: built-in first, then custom alphabetically
+ sorted_types = sorted(self.custom_entry_types.items(),
+ key=lambda x: (x[0] not in ['character', 'term'], x[0]))
+
+ # Create checkboxes for each type
+ for type_name, type_config in sorted_types:
+ var = tk.BooleanVar(value=type_config.get('enabled', True))
+ self.type_enabled_vars[type_name] = var
+
+ frame = tk.Frame(self.type_checkbox_frame)
+ frame.pack(fill=tk.X, pady=2)
+
+ # Checkbox
+ cb = tb.Checkbutton(frame, text=type_name, variable=var,
+ bootstyle="round-toggle")
+ cb.pack(side=tk.LEFT)
+
+ # Add gender indicator for types that support it
+ if type_config.get('has_gender', False):
+ tk.Label(frame, text="(has gender field)",
+ font=('TkDefaultFont', 9), fg='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Delete button for custom types
+ if type_name not in ['character', 'term']:
+ tb.Button(frame, text="×", command=lambda t=type_name: remove_type(t),
+ bootstyle="danger", width=3).pack(side=tk.RIGHT, padx=(5, 0))
+
+ # Update canvas scroll region
+ self.type_checkbox_frame.update_idletasks()
+ type_canvas.configure(scrollregion=type_canvas.bbox("all"))
+
+ # Right side - controls for adding custom types
+ type_control_frame = tk.Frame(type_main_container)
+ type_control_frame.grid(row=0, column=1, sticky="nsew")
+
+ tk.Label(type_control_frame, text="Add Custom Type:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W)
+
+ # Entry for new type field
+ new_type_frame = tk.Frame(type_control_frame)
+ new_type_frame.pack(fill=tk.X, pady=(5, 0))
+
+ tk.Label(new_type_frame, text="Type Field:").pack(anchor=tk.W)
+ new_type_entry = tb.Entry(new_type_frame)
+ new_type_entry.pack(fill=tk.X, pady=(2, 0))
+
+ # Checkbox for gender field
+ has_gender_var = tk.BooleanVar(value=False)
+ tb.Checkbutton(new_type_frame, text="Include gender field",
+ variable=has_gender_var).pack(anchor=tk.W, pady=(5, 0))
+
+ def add_custom_type():
+ type_name = new_type_entry.get().strip().lower()
+ if not type_name:
+ messagebox.showwarning("Invalid Input", "Please enter a type name")
+ return
+
+ if type_name in self.custom_entry_types:
+ messagebox.showwarning("Duplicate Type", f"Type '{type_name}' already exists")
+ return
+
+ # Add the new type
+ self.custom_entry_types[type_name] = {
+ 'enabled': True,
+ 'has_gender': has_gender_var.get()
+ }
+
+ # Clear inputs
+ new_type_entry.delete(0, tk.END)
+ has_gender_var.set(False)
+
+ # Update display
+ update_type_checkboxes()
+ self.append_log(f"✅ Added custom type: {type_name}")
+
+ def remove_type(type_name):
+ if type_name in ['character', 'term']:
+ messagebox.showwarning("Cannot Remove", "Built-in types cannot be removed")
+ return
+
+ if messagebox.askyesno("Confirm Removal", f"Remove type '{type_name}'?"):
+ del self.custom_entry_types[type_name]
+ if type_name in self.type_enabled_vars:
+ del self.type_enabled_vars[type_name]
+ update_type_checkboxes()
+ self.append_log(f"🗑️ Removed custom type: {type_name}")
+
+ tb.Button(new_type_frame, text="Add Type", command=add_custom_type,
+ bootstyle="success").pack(fill=tk.X, pady=(10, 0))
+
+ # Initialize checkboxes
+ update_type_checkboxes()
+
+ # Custom fields section
+ custom_frame = tk.LabelFrame(manual_container, text="Custom Fields (Additional Columns)", padx=10, pady=10)
+ custom_frame.pack(fill=tk.X, pady=(0, 10))
+
+ custom_list_frame = tk.Frame(custom_frame)
+ custom_list_frame.pack(fill=tk.X)
+
+ tk.Label(custom_list_frame, text="Additional fields to extract (will be added as extra columns):").pack(anchor=tk.W)
+
+ custom_scroll = ttk.Scrollbar(custom_list_frame)
+ custom_scroll.pack(side=tk.RIGHT, fill=tk.Y)
+
+ self.custom_fields_listbox = tk.Listbox(custom_list_frame, height=4,
+ yscrollcommand=custom_scroll.set)
+ self.custom_fields_listbox.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+ custom_scroll.config(command=self.custom_fields_listbox.yview)
+
+ # Initialize custom_glossary_fields if not exists
+ if not hasattr(self, 'custom_glossary_fields'):
+ self.custom_glossary_fields = self.config.get('custom_glossary_fields', [])
+
+ for field in self.custom_glossary_fields:
+ self.custom_fields_listbox.insert(tk.END, field)
+
+ custom_controls = tk.Frame(custom_frame)
+ custom_controls.pack(fill=tk.X, pady=(5, 0))
+
+ self.custom_field_entry = tb.Entry(custom_controls, width=30)
+ self.custom_field_entry.pack(side=tk.LEFT, padx=(0, 5))
+
+ def add_custom_field():
+ field = self.custom_field_entry.get().strip()
+ if field and field not in self.custom_glossary_fields:
+ self.custom_glossary_fields.append(field)
+ self.custom_fields_listbox.insert(tk.END, field)
+ self.custom_field_entry.delete(0, tk.END)
+
+ def remove_custom_field():
+ selection = self.custom_fields_listbox.curselection()
+ if selection:
+ idx = selection[0]
+ field = self.custom_fields_listbox.get(idx)
+ self.custom_glossary_fields.remove(field)
+ self.custom_fields_listbox.delete(idx)
+
+ tb.Button(custom_controls, text="Add", command=add_custom_field, width=10).pack(side=tk.LEFT, padx=2)
+ tb.Button(custom_controls, text="Remove", command=remove_custom_field, width=10).pack(side=tk.LEFT, padx=2)
+
+ # Duplicate Detection Settings
+ duplicate_frame = tk.LabelFrame(manual_container, text="Duplicate Detection", padx=10, pady=10)
+ duplicate_frame.pack(fill=tk.X, pady=(0, 10))
+
+ # Honorifics filter toggle
+ if not hasattr(self, 'disable_honorifics_var'):
+ self.disable_honorifics_var = tk.BooleanVar(value=self.config.get('glossary_disable_honorifics_filter', False))
+
+ tb.Checkbutton(duplicate_frame, text="Disable honorifics filtering",
+ variable=self.disable_honorifics_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(duplicate_frame, text="When enabled, honorifics (님, さん, 先生, etc.) will NOT be removed from raw names",
+ font=('TkDefaultFont', 9), fg='gray').pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Fuzzy matching slider
+ fuzzy_frame = tk.Frame(duplicate_frame)
+ fuzzy_frame.pack(fill=tk.X, pady=(10, 0))
+
+ tk.Label(fuzzy_frame, text="Fuzzy Matching Threshold:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W)
+
+ tk.Label(fuzzy_frame, text="Controls how similar names must be to be considered duplicates",
+ font=('TkDefaultFont', 9), fg='gray').pack(anchor=tk.W, pady=(0, 5))
+
+ # Slider frame
+ slider_frame = tk.Frame(fuzzy_frame)
+ slider_frame.pack(fill=tk.X, pady=(5, 0))
+
+ # Initialize fuzzy threshold variable
+ if not hasattr(self, 'fuzzy_threshold_var'):
+ self.fuzzy_threshold_var = tk.DoubleVar(value=self.config.get('glossary_fuzzy_threshold', 0.90))
+
+ # Slider
+ fuzzy_slider = tb.Scale(
+ slider_frame,
+ from_=0.5,
+ to=1.0,
+ orient=tk.HORIZONTAL,
+ variable=self.fuzzy_threshold_var,
+ style="info.Horizontal.TScale",
+ length=300
+ )
+ fuzzy_slider.pack(side=tk.LEFT, padx=(0, 10))
+
+ # Value label
+ self.fuzzy_value_label = tk.Label(slider_frame, text=f"{self.fuzzy_threshold_var.get():.2f}")
+ self.fuzzy_value_label.pack(side=tk.LEFT)
+
+ # Description label - CREATE THIS FIRST
+ fuzzy_desc_label = tk.Label(fuzzy_frame, text="", font=('TkDefaultFont', 9), fg='blue')
+ fuzzy_desc_label.pack(anchor=tk.W, pady=(5, 0))
+
+ # Token-efficient format toggle
+ format_frame = tk.LabelFrame(manual_container, text="Output Format", padx=10, pady=10)
+ format_frame.pack(fill=tk.X, pady=(0, 10))
+
+ # Initialize variable if not exists
+ if not hasattr(self, 'use_legacy_csv_var'):
+ self.use_legacy_csv_var = tk.BooleanVar(value=self.config.get('glossary_use_legacy_csv', False))
+
+ tb.Checkbutton(format_frame, text="Use legacy CSV format",
+ variable=self.use_legacy_csv_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(format_frame, text="When disabled (default): Uses token-efficient format with sections (=== CHARACTERS ===)",
+ font=('TkDefaultFont', 9), fg='gray').pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ tk.Label(format_frame, text="When enabled: Uses traditional CSV format with repeated type columns",
+ font=('TkDefaultFont', 9), fg='gray').pack(anchor=tk.W, padx=20)
+
+ # Update label when slider moves - DEFINE AFTER CREATING THE LABEL
+ def update_fuzzy_label(*args):
+ try:
+ # Check if widgets still exist before updating
+ if not fuzzy_desc_label.winfo_exists():
+ return
+ if not self.fuzzy_value_label.winfo_exists():
+ return
+
+ value = self.fuzzy_threshold_var.get()
+ self.fuzzy_value_label.config(text=f"{value:.2f}")
+
+ # Show description
+ if value >= 0.95:
+ desc = "Exact match only (strict)"
+ elif value >= 0.85:
+ desc = "Very similar names (recommended)"
+ elif value >= 0.75:
+ desc = "Moderately similar names"
+ elif value >= 0.65:
+ desc = "Loosely similar names"
+ else:
+ desc = "Very loose matching (may over-merge)"
+
+ fuzzy_desc_label.config(text=desc)
+ except tk.TclError:
+ # Widget was destroyed, ignore
+ pass
+ except Exception as e:
+ # Catch any other unexpected errors
+ print(f"Error updating fuzzy label: {e}")
+ pass
+
+ # Remove any existing trace before adding a new one
+ if hasattr(self, 'manual_fuzzy_trace_id'):
+ try:
+ self.fuzzy_threshold_var.trace_remove('write', self.manual_fuzzy_trace_id)
+ except:
+ pass
+
+ # Set up the trace AFTER creating the label and store the trace ID
+ self.manual_fuzzy_trace_id = self.fuzzy_threshold_var.trace('w', update_fuzzy_label)
+
+ # Initialize description by calling the function
+ try:
+ update_fuzzy_label()
+ except:
+ # If initialization fails, just continue
+ pass
+
+ # Prompt section (continues as before)
+ prompt_frame = tk.LabelFrame(manual_container, text="Extraction Prompt", padx=10, pady=10)
+ prompt_frame.pack(fill=tk.BOTH, expand=True)
+
+ tk.Label(prompt_frame, text="Use {fields} for field list and {chapter_text} for content placeholder",
+ font=('TkDefaultFont', 9), fg='blue').pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(prompt_frame, text="The {fields} placeholder will be replaced with the format specification",
+ font=('TkDefaultFont', 9), fg='gray').pack(anchor=tk.W, pady=(0, 5))
+
+ self.manual_prompt_text = self.ui.setup_scrollable_text(
+ prompt_frame, height=13, wrap=tk.WORD
+ )
+ self.manual_prompt_text.pack(fill=tk.BOTH, expand=True)
+
+ # Set default prompt if not already set
+ if not hasattr(self, 'manual_glossary_prompt') or not self.manual_glossary_prompt:
+ self.manual_glossary_prompt = """Extract character names and important terms from the following text.
+
+Output format:
+{fields}
+
+Rules:
+- Output ONLY CSV lines in the exact format shown above
+- No headers, no extra text, no JSON
+- One entry per line
+- Leave gender empty for terms (just end with comma)
+ """
+
+ self.manual_prompt_text.insert('1.0', self.manual_glossary_prompt)
+ self.manual_prompt_text.edit_reset()
+
+ prompt_controls = tk.Frame(manual_container)
+ prompt_controls.pack(fill=tk.X, pady=(10, 0))
+
+ def reset_manual_prompt():
+ if messagebox.askyesno("Reset Prompt", "Reset manual glossary prompt to default?"):
+ self.manual_prompt_text.delete('1.0', tk.END)
+ default_prompt = """Extract character names and important terms from the following text.
+
+ Output format:
+ {fields}
+
+ Rules:
+ - Output ONLY CSV lines in the exact format shown above
+ - No headers, no extra text, no JSON
+ - One entry per line
+ - Leave gender empty for terms (just end with comma)
+ """
+ self.manual_prompt_text.insert('1.0', default_prompt)
+
+ tb.Button(prompt_controls, text="Reset to Default", command=reset_manual_prompt,
+ bootstyle="warning").pack(side=tk.LEFT, padx=5)
+
+ # Settings
+ settings_frame = tk.LabelFrame(manual_container, text="Extraction Settings", padx=10, pady=10)
+ settings_frame.pack(fill=tk.X, pady=(10, 0))
+
+ settings_grid = tk.Frame(settings_frame)
+ settings_grid.pack()
+
+ tk.Label(settings_grid, text="Temperature:").grid(row=0, column=0, sticky=tk.W, padx=5)
+ self.manual_temp_var = tk.StringVar(value=str(self.config.get('manual_glossary_temperature', 0.1)))
+ tb.Entry(settings_grid, textvariable=self.manual_temp_var, width=10).grid(row=0, column=1, padx=5)
+
+ tk.Label(settings_grid, text="Context Limit:").grid(row=0, column=2, sticky=tk.W, padx=5)
+ self.manual_context_var = tk.StringVar(value=str(self.config.get('manual_context_limit', 2)))
+ tb.Entry(settings_grid, textvariable=self.manual_context_var, width=10).grid(row=0, column=3, padx=5)
+
+ tk.Label(settings_grid, text="Rolling Window:").grid(row=1, column=0, sticky=tk.W, padx=5, pady=(10, 0))
+ tb.Checkbutton(settings_grid, text="Keep recent context instead of reset",
+ variable=self.glossary_history_rolling_var,
+ bootstyle="round-toggle").grid(row=1, column=1, columnspan=3, sticky=tk.W, padx=5, pady=(10, 0))
+
+ tk.Label(settings_grid, text="When context limit is reached, keep recent chapters instead of clearing all history",
+ font=('TkDefaultFont', 11), fg='gray').grid(row=2, column=0, columnspan=4, sticky=tk.W, padx=20, pady=(0, 5))
+
+ def update_glossary_prompts(self):
+ """Update glossary prompts from text widgets if they exist"""
+ try:
+ if hasattr(self, 'manual_prompt_text'):
+ self.manual_glossary_prompt = self.manual_prompt_text.get('1.0', tk.END).strip()
+
+ if hasattr(self, 'auto_prompt_text'):
+ self.auto_glossary_prompt = self.auto_prompt_text.get('1.0', tk.END).strip()
+
+ if hasattr(self, 'append_prompt_text'):
+ self.append_glossary_prompt = self.append_prompt_text.get('1.0', tk.END).strip()
+
+ if hasattr(self, 'translation_prompt_text'):
+ self.glossary_translation_prompt = self.translation_prompt_text.get('1.0', tk.END).strip()
+
+ if hasattr(self, 'format_instructions_text'):
+ self.glossary_format_instructions = self.format_instructions_text.get('1.0', tk.END).strip()
+
+ except Exception as e:
+ print(f"Error updating glossary prompts: {e}")
+
+ def _setup_auto_glossary_tab(self, parent):
+ """Setup automatic glossary tab with fully configurable prompts"""
+ auto_container = tk.Frame(parent)
+ auto_container.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ # Master toggle
+ master_toggle_frame = tk.Frame(auto_container)
+ master_toggle_frame.pack(fill=tk.X, pady=(0, 15))
+
+ tb.Checkbutton(master_toggle_frame, text="Enable Automatic Glossary Generation",
+ variable=self.enable_auto_glossary_var,
+ bootstyle="round-toggle").pack(side=tk.LEFT)
+
+ tk.Label(master_toggle_frame, text="(Automatic extraction and translation of character names/Terms)",
+ font=('TkDefaultFont', 9), fg='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Append glossary toggle
+ append_frame = tk.Frame(auto_container)
+ append_frame.pack(fill=tk.X, pady=(0, 15))
+
+ tb.Checkbutton(append_frame, text="Append Glossary to System Prompt",
+ variable=self.append_glossary_var,
+ bootstyle="round-toggle").pack(side=tk.LEFT)
+
+ tk.Label(append_frame, text="(Applies to ALL glossaries - manual and automatic)",
+ font=('TkDefaultFont', 10, 'italic'), fg='blue').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Custom append prompt section
+ append_prompt_frame = tk.LabelFrame(auto_container, text="Glossary Append Format", padx=10, pady=10)
+ append_prompt_frame.pack(fill=tk.X, pady=(0, 15))
+
+ tk.Label(append_prompt_frame, text="This text will be added before the glossary entries:",
+ font=('TkDefaultFont', 10)).pack(anchor=tk.W, pady=(0, 5))
+
+ self.append_prompt_text = self.ui.setup_scrollable_text(
+ append_prompt_frame, height=2, wrap=tk.WORD
+ )
+ self.append_prompt_text.pack(fill=tk.X)
+
+ # Set default append prompt if not already set
+ if not hasattr(self, 'append_glossary_prompt') or not self.append_glossary_prompt:
+ self.append_glossary_prompt = "- Follow this reference glossary for consistent translation (Do not output any raw entries):\n"
+
+ self.append_prompt_text.insert('1.0', self.append_glossary_prompt)
+ self.append_prompt_text.edit_reset()
+
+ append_prompt_controls = tk.Frame(append_prompt_frame)
+ append_prompt_controls.pack(fill=tk.X, pady=(5, 0))
+
+ def reset_append_prompt():
+ if messagebox.askyesno("Reset Prompt", "Reset to default glossary append format?"):
+ self.append_prompt_text.delete('1.0', tk.END)
+ self.append_prompt_text.insert('1.0', "- Follow this reference glossary for consistent translation (Do not output any raw entries):\n")
+
+ tb.Button(append_prompt_controls, text="Reset to Default", command=reset_append_prompt,
+ bootstyle="warning").pack(side=tk.LEFT, padx=5)
+
+ # Create notebook for tabs
+ notebook = ttk.Notebook(auto_container)
+ notebook.pack(fill=tk.BOTH, expand=True)
+
+ # Tab 1: Extraction Settings
+ extraction_tab = tk.Frame(notebook)
+ notebook.add(extraction_tab, text="Extraction Settings")
+
+ # Extraction settings
+ settings_label_frame = tk.LabelFrame(extraction_tab, text="Targeted Extraction Settings", padx=10, pady=10)
+ settings_label_frame.pack(fill=tk.X, padx=10, pady=10)
+
+ extraction_grid = tk.Frame(settings_label_frame)
+ extraction_grid.pack(fill=tk.X)
+
+ # Row 1
+ tk.Label(extraction_grid, text="Min frequency:").grid(row=0, column=0, sticky=tk.W, padx=(0, 5))
+ tb.Entry(extraction_grid, textvariable=self.glossary_min_frequency_var, width=10).grid(row=0, column=1, sticky=tk.W, padx=(0, 20))
+
+ tk.Label(extraction_grid, text="Max names:").grid(row=0, column=2, sticky=tk.W, padx=(0, 5))
+ tb.Entry(extraction_grid, textvariable=self.glossary_max_names_var, width=10).grid(row=0, column=3, sticky=tk.W)
+
+ # Row 2
+ tk.Label(extraction_grid, text="Max titles:").grid(row=1, column=0, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ tb.Entry(extraction_grid, textvariable=self.glossary_max_titles_var, width=10).grid(row=1, column=1, sticky=tk.W, padx=(0, 20), pady=(5, 0))
+
+ tk.Label(extraction_grid, text="Translation batch:").grid(row=1, column=2, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ tb.Entry(extraction_grid, textvariable=self.glossary_batch_size_var, width=10).grid(row=1, column=3, sticky=tk.W, pady=(5, 0))
+
+ # Row 3 - Max text size and chapter split
+ tk.Label(extraction_grid, text="Max text size:").grid(row=3, column=0, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ tb.Entry(extraction_grid, textvariable=self.glossary_max_text_size_var, width=10).grid(row=3, column=1, sticky=tk.W, padx=(0, 20), pady=(5, 0))
+
+ tk.Label(extraction_grid, text="Chapter split threshold:").grid(row=3, column=2, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ tb.Entry(extraction_grid, textvariable=self.glossary_chapter_split_threshold_var, width=10).grid(row=3, column=3, sticky=tk.W, pady=(5, 0))
+
+ # Row 4 - Max sentences for glossary
+ tk.Label(extraction_grid, text="Max sentences:").grid(row=4, column=0, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ tb.Entry(extraction_grid, textvariable=self.glossary_max_sentences_var, width=10).grid(row=4, column=1, sticky=tk.W, padx=(0, 20), pady=(5, 0))
+
+ tk.Label(extraction_grid, text="(Limit for AI processing)", font=('TkDefaultFont', 9), fg='gray').grid(row=4, column=2, columnspan=2, sticky=tk.W, pady=(5, 0))
+
+ # Row 5 - Filter mode
+ tk.Label(extraction_grid, text="Filter mode:").grid(row=5, column=0, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ filter_frame = tk.Frame(extraction_grid)
+ filter_frame.grid(row=5, column=1, columnspan=3, sticky=tk.W, pady=(5, 0))
+
+ tb.Radiobutton(filter_frame, text="All names & terms", variable=self.glossary_filter_mode_var,
+ value="all", bootstyle="info").pack(side=tk.LEFT, padx=(0, 10))
+ tb.Radiobutton(filter_frame, text="Names with honorifics only", variable=self.glossary_filter_mode_var,
+ value="only_with_honorifics", bootstyle="info").pack(side=tk.LEFT, padx=(0, 10))
+ tb.Radiobutton(filter_frame, text="Names without honorifics & terms", variable=self.glossary_filter_mode_var,
+ value="only_without_honorifics", bootstyle="info").pack(side=tk.LEFT)
+
+ # Row 6 - Strip honorifics
+ tk.Label(extraction_grid, text="Strip honorifics:").grid(row=6, column=0, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+ tb.Checkbutton(extraction_grid, text="Remove honorifics from extracted names",
+ variable=self.strip_honorifics_var,
+ bootstyle="round-toggle").grid(row=6, column=1, columnspan=3, sticky=tk.W, pady=(5, 0))
+
+ # Row 7 - Fuzzy matching threshold (reuse existing variable)
+ tk.Label(extraction_grid, text="Fuzzy threshold:").grid(row=7, column=0, sticky=tk.W, padx=(0, 5), pady=(5, 0))
+
+ fuzzy_frame = tk.Frame(extraction_grid)
+ fuzzy_frame.grid(row=7, column=1, columnspan=3, sticky=tk.W, pady=(5, 0))
+
+ # Reuse the existing fuzzy_threshold_var that's already initialized elsewhere
+ fuzzy_slider = tb.Scale(
+ fuzzy_frame,
+ from_=0.5,
+ to=1.0,
+ orient=tk.HORIZONTAL,
+ variable=self.fuzzy_threshold_var,
+ length=200,
+ bootstyle="info"
+ )
+ fuzzy_slider.pack(side=tk.LEFT, padx=(0, 10))
+
+ fuzzy_value_label = tk.Label(fuzzy_frame, text=f"{self.fuzzy_threshold_var.get():.2f}")
+ fuzzy_value_label.pack(side=tk.LEFT, padx=(0, 10))
+
+ fuzzy_desc_label = tk.Label(fuzzy_frame, text="", font=('TkDefaultFont', 9), fg='gray')
+ fuzzy_desc_label.pack(side=tk.LEFT)
+
+ # Reuse the exact same update function logic
+ def update_fuzzy_label(*args):
+ try:
+ # Check if widgets still exist before updating
+ if not fuzzy_desc_label.winfo_exists():
+ return
+ if not fuzzy_value_label.winfo_exists():
+ return
+
+ value = self.fuzzy_threshold_var.get()
+ fuzzy_value_label.config(text=f"{value:.2f}")
+
+ # Show description
+ if value >= 0.95:
+ desc = "Exact match only (strict)"
+ elif value >= 0.85:
+ desc = "Very similar names (recommended)"
+ elif value >= 0.75:
+ desc = "Moderately similar names"
+ elif value >= 0.65:
+ desc = "Loosely similar names"
+ else:
+ desc = "Very loose matching (may over-merge)"
+
+ fuzzy_desc_label.config(text=desc)
+ except tk.TclError:
+ # Widget was destroyed, ignore
+ pass
+ except Exception as e:
+ # Catch any other unexpected errors
+ print(f"Error updating auto fuzzy label: {e}")
+ pass
+
+ # Remove any existing auto trace before adding a new one
+ if hasattr(self, 'auto_fuzzy_trace_id'):
+ try:
+ self.fuzzy_threshold_var.trace_remove('write', self.auto_fuzzy_trace_id)
+ except:
+ pass
+
+ # Set up the trace AFTER creating the label and store the trace ID
+ self.auto_fuzzy_trace_id = self.fuzzy_threshold_var.trace('w', update_fuzzy_label)
+
+ # Initialize description by calling the function
+ try:
+ update_fuzzy_label()
+ except:
+ # If initialization fails, just continue
+ pass
+
+ # Initialize the variable if not exists
+ if not hasattr(self, 'strip_honorifics_var'):
+ self.strip_honorifics_var = tk.BooleanVar(value=True)
+
+ # Help text
+ help_frame = tk.Frame(extraction_tab)
+ help_frame.pack(fill=tk.X, padx=10, pady=(10, 0))
+
+ tk.Label(help_frame, text="💡 Settings Guide:", font=('TkDefaultFont', 12, 'bold')).pack(anchor=tk.W)
+ help_texts = [
+ "• Min frequency: How many times a name must appear (lower = more terms)",
+ "• Max names/titles: Limits to prevent huge glossaries",
+ "• Translation batch: Terms per API call (larger = faster but may reduce quality)",
+ "• Max text size: Characters to analyze (0 = entire text, 50000 = first 50k chars)",
+ "• Chapter split: Split large texts into chunks (0 = no splitting, 100000 = split at 100k chars)",
+ "• Max sentences: Maximum sentences to send to AI (default 200, increase for more context)",
+ "• Filter mode:",
+ " - All names & terms: Extract character names (with/without honorifics) + titles/terms",
+ " - Names with honorifics only: ONLY character names with honorifics (no titles/terms)",
+ " - Names without honorifics & terms: Character names without honorifics + titles/terms",
+ "• Strip honorifics: Remove suffixes from extracted names (e.g., '김' instead of '김님')",
+ "• Fuzzy threshold: How similar terms must be to match (0.9 = 90% match, 1.0 = exact match)"
+ ]
+ for txt in help_texts:
+ tk.Label(help_frame, text=txt, font=('TkDefaultFont', 11), fg='gray').pack(anchor=tk.W, padx=20)
+
+ # Tab 2: Extraction Prompt
+ extraction_prompt_tab = tk.Frame(notebook)
+ notebook.add(extraction_prompt_tab, text="Extraction Prompt")
+
+ # Auto prompt section
+ auto_prompt_frame = tk.LabelFrame(extraction_prompt_tab, text="Extraction Template (System Prompt)", padx=10, pady=10)
+ auto_prompt_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ tk.Label(auto_prompt_frame, text="Available placeholders: {language}, {min_frequency}, {max_names}, {max_titles}",
+ font=('TkDefaultFont', 9), fg='blue').pack(anchor=tk.W, pady=(0, 5))
+
+ self.auto_prompt_text = self.ui.setup_scrollable_text(
+ auto_prompt_frame, height=12, wrap=tk.WORD
+ )
+ self.auto_prompt_text.pack(fill=tk.BOTH, expand=True)
+
+ # Set default extraction prompt if not set
+ if not hasattr(self, 'auto_glossary_prompt') or not self.auto_glossary_prompt:
+ self.auto_glossary_prompt = self.default_auto_glossary_prompt
+
+ self.auto_prompt_text.insert('1.0', self.auto_glossary_prompt)
+ self.auto_prompt_text.edit_reset()
+
+ auto_prompt_controls = tk.Frame(extraction_prompt_tab)
+ auto_prompt_controls.pack(fill=tk.X, padx=10, pady=(0, 10))
+
+ def reset_auto_prompt():
+ if messagebox.askyesno("Reset Prompt", "Reset automatic glossary prompt to default?"):
+ self.auto_prompt_text.delete('1.0', tk.END)
+ self.auto_prompt_text.insert('1.0', self.default_auto_glossary_prompt)
+
+ tb.Button(auto_prompt_controls, text="Reset to Default", command=reset_auto_prompt,
+ bootstyle="warning").pack(side=tk.LEFT, padx=5)
+
+ # Tab 3: Format Instructions - NEW TAB
+ format_tab = tk.Frame(notebook)
+ notebook.add(format_tab, text="Format Instructions")
+
+ # Format instructions section
+ format_prompt_frame = tk.LabelFrame(format_tab, text="Output Format Instructions (User Prompt)", padx=10, pady=10)
+ format_prompt_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ tk.Label(format_prompt_frame, text="These instructions are added to your extraction prompt to specify the output format:",
+ font=('TkDefaultFont', 10)).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(format_prompt_frame, text="Available placeholders: {text_sample}",
+ font=('TkDefaultFont', 9), fg='blue').pack(anchor=tk.W, pady=(0, 5))
+
+ # Initialize format instructions variable and text widget
+ if not hasattr(self, 'glossary_format_instructions'):
+ self.glossary_format_instructions = """
+Return the results in EXACT CSV format with this header:
+type,raw_name,translated_name
+
+For example:
+character,김상현,Kim Sang-hyu
+character,갈편제,Gale Hardest
+character,디히릿 아데,Dihirit Ade
+
+Only include terms that actually appear in the text.
+Do not use quotes around values unless they contain commas.
+
+Text to analyze:
+{text_sample}"""
+
+ self.format_instructions_text = self.ui.setup_scrollable_text(
+ format_prompt_frame, height=12, wrap=tk.WORD
+ )
+ self.format_instructions_text.pack(fill=tk.BOTH, expand=True)
+ self.format_instructions_text.insert('1.0', self.glossary_format_instructions)
+ self.format_instructions_text.edit_reset()
+
+ format_prompt_controls = tk.Frame(format_tab)
+ format_prompt_controls.pack(fill=tk.X, padx=10, pady=(0, 10))
+
+ def reset_format_instructions():
+ if messagebox.askyesno("Reset Prompt", "Reset format instructions to default?"):
+ default_format_instructions = """
+Return the results in EXACT CSV format with this header:
+type,raw_name,translated_name
+
+For example:
+character,김상현,Kim Sang-hyu
+character,갈편제,Gale Hardest
+character,디히릿 아데,Dihirit Ade
+
+Only include terms that actually appear in the text.
+Do not use quotes around values unless they contain commas.
+
+Text to analyze:
+{text_sample}"""
+ self.format_instructions_text.delete('1.0', tk.END)
+ self.format_instructions_text.insert('1.0', default_format_instructions)
+
+ tb.Button(format_prompt_controls, text="Reset to Default", command=reset_format_instructions,
+ bootstyle="warning").pack(side=tk.LEFT, padx=5)
+
+ # Tab 4: Translation Prompt (moved from Tab 3)
+ translation_prompt_tab = tk.Frame(notebook)
+ notebook.add(translation_prompt_tab, text="Translation Prompt")
+
+ # Translation prompt section
+ trans_prompt_frame = tk.LabelFrame(translation_prompt_tab, text="Glossary Translation Template (User Prompt)", padx=10, pady=10)
+ trans_prompt_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ tk.Label(trans_prompt_frame, text="This prompt is used to translate extracted terms to English:",
+ font=('TkDefaultFont', 10)).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(trans_prompt_frame, text="Available placeholders: {language}, {terms_list}, {batch_size}",
+ font=('TkDefaultFont', 9), fg='blue').pack(anchor=tk.W, pady=(0, 5))
+
+ # Initialize translation prompt variable and text widget
+ if not hasattr(self, 'glossary_translation_prompt'):
+ self.glossary_translation_prompt = """
+You are translating {language} character names and important terms to English.
+For character names, provide English transliterations or keep as romanized.
+Keep honorifics/suffixes only if they are integral to the name.
+Respond with the same numbered format.
+
+Terms to translate:
+{terms_list}
+
+Provide translations in the same numbered format."""
+
+ self.translation_prompt_text = self.ui.setup_scrollable_text(
+ trans_prompt_frame, height=12, wrap=tk.WORD
+ )
+ self.translation_prompt_text.pack(fill=tk.BOTH, expand=True)
+ self.translation_prompt_text.insert('1.0', self.glossary_translation_prompt)
+ self.translation_prompt_text.edit_reset()
+
+ trans_prompt_controls = tk.Frame(translation_prompt_tab)
+ trans_prompt_controls.pack(fill=tk.X, padx=10, pady=(0, 10))
+
+ def reset_trans_prompt():
+ if messagebox.askyesno("Reset Prompt", "Reset translation prompt to default?"):
+ default_trans_prompt = """
+You are translating {language} character names and important terms to English.
+For character names, provide English transliterations or keep as romanized.
+Keep honorifics/suffixes only if they are integral to the name.
+Respond with the same numbered format.
+
+Terms to translate:
+{terms_list}
+
+Provide translations in the same numbered format."""
+ self.translation_prompt_text.delete('1.0', tk.END)
+ self.translation_prompt_text.insert('1.0', default_trans_prompt)
+
+ tb.Button(trans_prompt_controls, text="Reset to Default", command=reset_trans_prompt,
+ bootstyle="warning").pack(side=tk.LEFT, padx=5)
+
+ # Update states function with proper error handling
+ def update_auto_glossary_state():
+ try:
+ if not extraction_grid.winfo_exists():
+ return
+ state = tk.NORMAL if self.enable_auto_glossary_var.get() else tk.DISABLED
+ for widget in extraction_grid.winfo_children():
+ if isinstance(widget, (tb.Entry, ttk.Entry, tb.Checkbutton, ttk.Checkbutton)):
+ widget.config(state=state)
+ # Handle frames that contain radio buttons or scales
+ elif isinstance(widget, tk.Frame):
+ for child in widget.winfo_children():
+ if isinstance(child, (tb.Radiobutton, ttk.Radiobutton, tb.Scale, ttk.Scale)):
+ child.config(state=state)
+ if self.auto_prompt_text.winfo_exists():
+ self.auto_prompt_text.config(state=state)
+ if hasattr(self, 'format_instructions_text') and self.format_instructions_text.winfo_exists():
+ self.format_instructions_text.config(state=state)
+ if hasattr(self, 'translation_prompt_text') and self.translation_prompt_text.winfo_exists():
+ self.translation_prompt_text.config(state=state)
+ for widget in auto_prompt_controls.winfo_children():
+ if isinstance(widget, (tb.Button, ttk.Button)) and widget.winfo_exists():
+ widget.config(state=state)
+ for widget in format_prompt_controls.winfo_children():
+ if isinstance(widget, (tb.Button, ttk.Button)) and widget.winfo_exists():
+ widget.config(state=state)
+ for widget in trans_prompt_controls.winfo_children():
+ if isinstance(widget, (tb.Button, ttk.Button)) and widget.winfo_exists():
+ widget.config(state=state)
+ except tk.TclError:
+ # Widget was destroyed, ignore
+ pass
+
+ def update_append_prompt_state():
+ try:
+ if not self.append_prompt_text.winfo_exists():
+ return
+ state = tk.NORMAL if self.append_glossary_var.get() else tk.DISABLED
+ self.append_prompt_text.config(state=state)
+ for widget in append_prompt_controls.winfo_children():
+ if isinstance(widget, (tb.Button, ttk.Button)) and widget.winfo_exists():
+ widget.config(state=state)
+ except tk.TclError:
+ # Widget was destroyed, ignore
+ pass
+
+ # Initialize states
+ update_auto_glossary_state()
+ update_append_prompt_state()
+
+ # Add traces
+ self.enable_auto_glossary_var.trace('w', lambda *args: update_auto_glossary_state())
+ self.append_glossary_var.trace('w', lambda *args: update_append_prompt_state())
+
+ def _setup_glossary_editor_tab(self, parent):
+ """Set up the glossary editor/trimmer tab"""
+ container = tk.Frame(parent)
+ container.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
+
+ file_frame = tk.Frame(container)
+ file_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(file_frame, text="Glossary File:").pack(side=tk.LEFT, padx=(0, 5))
+ self.editor_file_var = tk.StringVar()
+ tb.Entry(file_frame, textvariable=self.editor_file_var, state='readonly').pack(side=tk.LEFT, fill=tk.X, expand=True, padx=5)
+
+ stats_frame = tk.Frame(container)
+ stats_frame.pack(fill=tk.X, pady=(0, 5))
+ self.stats_label = tk.Label(stats_frame, text="No glossary loaded", font=('TkDefaultFont', 10, 'italic'))
+ self.stats_label.pack(side=tk.LEFT)
+
+ content_frame = tk.LabelFrame(container, text="Glossary Entries", padx=10, pady=10)
+ content_frame.pack(fill=tk.BOTH, expand=True)
+
+ tree_frame = tk.Frame(content_frame)
+ tree_frame.pack(fill=tk.BOTH, expand=True)
+
+ vsb = ttk.Scrollbar(tree_frame, orient="vertical")
+ hsb = ttk.Scrollbar(tree_frame, orient="horizontal")
+
+ self.glossary_tree = ttk.Treeview(tree_frame, show='tree headings',
+ yscrollcommand=vsb.set, xscrollcommand=hsb.set)
+
+ vsb.config(command=self.glossary_tree.yview)
+ hsb.config(command=self.glossary_tree.xview)
+
+ self.glossary_tree.grid(row=0, column=0, sticky='nsew')
+ vsb.grid(row=0, column=1, sticky='ns')
+ hsb.grid(row=1, column=0, sticky='ew')
+
+ tree_frame.grid_rowconfigure(0, weight=1)
+ tree_frame.grid_columnconfigure(0, weight=1)
+
+ self.glossary_tree.bind('', self._on_tree_double_click)
+
+ self.current_glossary_data = None
+ self.current_glossary_format = None
+
+ # Editor functions
+ def load_glossary_for_editing():
+ path = self.editor_file_var.get()
+ if not path or not os.path.exists(path):
+ messagebox.showerror("Error", "Please select a valid glossary file")
+ return
+
+ try:
+ # Try CSV first
+ if path.endswith('.csv'):
+ import csv
+ entries = []
+ with open(path, 'r', encoding='utf-8') as f:
+ reader = csv.reader(f)
+ for row in reader:
+ if len(row) >= 3:
+ entry = {
+ 'type': row[0],
+ 'raw_name': row[1],
+ 'translated_name': row[2]
+ }
+ if row[0] == 'character' and len(row) > 3:
+ entry['gender'] = row[3]
+ entries.append(entry)
+ self.current_glossary_data = entries
+ self.current_glossary_format = 'list'
+ else:
+ # JSON format
+ with open(path, 'r', encoding='utf-8') as f:
+ data = json.load(f)
+
+ entries = []
+ all_fields = set()
+
+ if isinstance(data, dict):
+ if 'entries' in data:
+ self.current_glossary_data = data
+ self.current_glossary_format = 'dict'
+ for original, translated in data['entries'].items():
+ entry = {'original': original, 'translated': translated}
+ entries.append(entry)
+ all_fields.update(entry.keys())
+ else:
+ self.current_glossary_data = {'entries': data}
+ self.current_glossary_format = 'dict'
+ for original, translated in data.items():
+ entry = {'original': original, 'translated': translated}
+ entries.append(entry)
+ all_fields.update(entry.keys())
+
+ elif isinstance(data, list):
+ self.current_glossary_data = data
+ self.current_glossary_format = 'list'
+ for item in data:
+ all_fields.update(item.keys())
+ entries.append(item)
+
+ # Set up columns based on new format
+ if self.current_glossary_format == 'list' and entries and 'type' in entries[0]:
+ # New simple format
+ column_fields = ['type', 'raw_name', 'translated_name', 'gender']
+
+ # Check for any custom fields
+ for entry in entries:
+ for field in entry.keys():
+ if field not in column_fields:
+ column_fields.append(field)
+ else:
+ # Old format compatibility
+ standard_fields = ['original_name', 'name', 'original', 'translated', 'gender',
+ 'title', 'group_affiliation', 'traits', 'how_they_refer_to_others',
+ 'locations']
+
+ column_fields = []
+ for field in standard_fields:
+ if field in all_fields:
+ column_fields.append(field)
+
+ custom_fields = sorted(all_fields - set(standard_fields))
+ column_fields.extend(custom_fields)
+
+ self.glossary_tree.delete(*self.glossary_tree.get_children())
+ self.glossary_tree['columns'] = column_fields
+
+ self.glossary_tree.heading('#0', text='#')
+ self.glossary_tree.column('#0', width=40, stretch=False)
+
+ for field in column_fields:
+ display_name = field.replace('_', ' ').title()
+ self.glossary_tree.heading(field, text=display_name)
+
+ if field in ['raw_name', 'translated_name', 'original_name', 'name', 'original', 'translated']:
+ width = 150
+ elif field in ['traits', 'locations', 'how_they_refer_to_others']:
+ width = 200
+ else:
+ width = 100
+
+ self.glossary_tree.column(field, width=width)
+
+ for idx, entry in enumerate(entries):
+ values = []
+ for field in column_fields:
+ value = entry.get(field, '')
+ if isinstance(value, list):
+ value = ', '.join(str(v) for v in value)
+ elif isinstance(value, dict):
+ value = ', '.join(f"{k}: {v}" for k, v in value.items())
+ elif value is None:
+ value = ''
+ values.append(value)
+
+ self.glossary_tree.insert('', 'end', text=str(idx + 1), values=values)
+
+ # Update stats
+ stats = []
+ stats.append(f"Total entries: {len(entries)}")
+
+ if self.current_glossary_format == 'list' and entries and 'type' in entries[0]:
+ # New format stats
+ characters = sum(1 for e in entries if e.get('type') == 'character')
+ terms = sum(1 for e in entries if e.get('type') == 'term')
+ stats.append(f"Characters: {characters}, Terms: {terms}")
+ elif self.current_glossary_format == 'list':
+ # Old format stats
+ chars = sum(1 for e in entries if 'original_name' in e or 'name' in e)
+ locs = sum(1 for e in entries if 'locations' in e and e['locations'])
+ stats.append(f"Characters: {chars}, Locations: {locs}")
+
+ self.stats_label.config(text=" | ".join(stats))
+ self.append_log(f"✅ Loaded {len(entries)} entries from glossary")
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to load glossary: {e}")
+ self.append_log(f"❌ Failed to load glossary: {e}")
+
+ def browse_glossary():
+ path = filedialog.askopenfilename(
+ title="Select glossary file",
+ filetypes=[("Glossary files", "*.json *.csv"), ("JSON files", "*.json"), ("CSV files", "*.csv")]
+ )
+ if path:
+ self.editor_file_var.set(path)
+ load_glossary_for_editing()
+
+ # Common save helper
+ def save_current_glossary():
+ path = self.editor_file_var.get()
+ if not path or not self.current_glossary_data:
+ return False
+ try:
+ if path.endswith('.csv'):
+ # Save as CSV
+ import csv
+ with open(path, 'w', encoding='utf-8', newline='') as f:
+ writer = csv.writer(f)
+ for entry in self.current_glossary_data:
+ if entry.get('type') == 'character':
+ writer.writerow([entry.get('type', ''), entry.get('raw_name', ''),
+ entry.get('translated_name', ''), entry.get('gender', '')])
+ else:
+ writer.writerow([entry.get('type', ''), entry.get('raw_name', ''),
+ entry.get('translated_name', ''), ''])
+ else:
+ # Save as JSON
+ with open(path, 'w', encoding='utf-8') as f:
+ json.dump(self.current_glossary_data, f, ensure_ascii=False, indent=2)
+ return True
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to save: {e}")
+ return False
+
+ def clean_empty_fields():
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ if self.current_glossary_format == 'list':
+ # Check if there are any empty fields
+ empty_fields_found = False
+ fields_cleaned = {}
+
+ # Count empty fields first
+ for entry in self.current_glossary_data:
+ for field in list(entry.keys()):
+ value = entry[field]
+ if value is None or value == "" or (isinstance(value, list) and len(value) == 0) or (isinstance(value, dict) and len(value) == 0):
+ empty_fields_found = True
+ fields_cleaned[field] = fields_cleaned.get(field, 0) + 1
+
+ # If no empty fields found, show message and return
+ if not empty_fields_found:
+ messagebox.showinfo("Info", "No empty fields found in glossary")
+ return
+
+ # Only create backup if there are fields to clean
+ if not self.create_glossary_backup("before_clean"):
+ return
+
+ # Now actually clean the fields
+ total_cleaned = 0
+ for entry in self.current_glossary_data:
+ for field in list(entry.keys()):
+ value = entry[field]
+ if value is None or value == "" or (isinstance(value, list) and len(value) == 0) or (isinstance(value, dict) and len(value) == 0):
+ entry.pop(field)
+ total_cleaned += 1
+
+ if save_current_glossary():
+ load_glossary_for_editing()
+
+ # Provide detailed feedback
+ msg = f"Cleaned {total_cleaned} empty fields\n\n"
+ msg += "Fields cleaned:\n"
+ for field, count in sorted(fields_cleaned.items(), key=lambda x: x[1], reverse=True):
+ msg += f"• {field}: {count} entries\n"
+
+ messagebox.showinfo("Success", msg)
+
+ def delete_selected_entries():
+ selected = self.glossary_tree.selection()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select entries to delete")
+ return
+
+ count = len(selected)
+ if messagebox.askyesno("Confirm Delete", f"Delete {count} selected entries?"):
+ # automatic backup
+ if not self.create_glossary_backup(f"before_delete_{count}"):
+ return
+
+ indices_to_delete = []
+ for item in selected:
+ idx = int(self.glossary_tree.item(item)['text']) - 1
+ indices_to_delete.append(idx)
+
+ indices_to_delete.sort(reverse=True)
+
+ if self.current_glossary_format == 'list':
+ for idx in indices_to_delete:
+ if 0 <= idx < len(self.current_glossary_data):
+ del self.current_glossary_data[idx]
+
+ elif self.current_glossary_format == 'dict':
+ entries_list = list(self.current_glossary_data.get('entries', {}).items())
+ for idx in indices_to_delete:
+ if 0 <= idx < len(entries_list):
+ key = entries_list[idx][0]
+ self.current_glossary_data['entries'].pop(key, None)
+
+ if save_current_glossary():
+ load_glossary_for_editing()
+ messagebox.showinfo("Success", f"Deleted {len(indices_to_delete)} entries")
+
+ def remove_duplicates():
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ if self.current_glossary_format == 'list':
+ # Import the skip function from the updated script
+ try:
+ from extract_glossary_from_epub import skip_duplicate_entries, remove_honorifics
+
+ # Set environment variable for honorifics toggle
+ os.environ['GLOSSARY_DISABLE_HONORIFICS_FILTER'] = '1' if self.config.get('glossary_disable_honorifics_filter', False) else '0'
+
+ original_count = len(self.current_glossary_data)
+ self.current_glossary_data = skip_duplicate_entries(self.current_glossary_data)
+ duplicates_removed = original_count - len(self.current_glossary_data)
+
+ if duplicates_removed > 0:
+ if self.config.get('glossary_auto_backup', False):
+ self.create_glossary_backup(f"before_remove_{duplicates_removed}_dupes")
+
+ if save_current_glossary():
+ load_glossary_for_editing()
+ messagebox.showinfo("Success", f"Removed {duplicates_removed} duplicate entries")
+ self.append_log(f"🗑️ Removed {duplicates_removed} duplicates based on raw_name")
+ else:
+ messagebox.showinfo("Info", "No duplicates found")
+
+ except ImportError:
+ # Fallback implementation
+ seen_raw_names = set()
+ unique_entries = []
+ duplicates = 0
+
+ for entry in self.current_glossary_data:
+ raw_name = entry.get('raw_name', '').lower().strip()
+ if raw_name and raw_name not in seen_raw_names:
+ seen_raw_names.add(raw_name)
+ unique_entries.append(entry)
+ elif raw_name:
+ duplicates += 1
+
+ if duplicates > 0:
+ self.current_glossary_data = unique_entries
+ if save_current_glossary():
+ load_glossary_for_editing()
+ messagebox.showinfo("Success", f"Removed {duplicates} duplicate entries")
+ else:
+ messagebox.showinfo("Info", "No duplicates found")
+
+ # dialog function for configuring duplicate detection mode
+ def duplicate_detection_settings():
+ """Show info about duplicate detection (simplified for new format)"""
+ messagebox.showinfo(
+ "Duplicate Detection",
+ "Duplicate detection is based on the raw_name field.\n\n"
+ "• Entries with identical raw_name values are considered duplicates\n"
+ "• The first occurrence is kept, later ones are removed\n"
+ "• Honorifics filtering can be toggled in the Manual Glossary tab\n\n"
+ "When honorifics filtering is enabled, names are compared after removing honorifics."
+ )
+
+ def backup_settings_dialog():
+ """Show dialog for configuring automatic backup settings"""
+ # Use setup_scrollable with custom ratios
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Automatic Backup Settings",
+ width=500,
+ height=None,
+ max_width_ratio=0.45,
+ max_height_ratio=0.51
+ )
+
+ # Main frame
+ main_frame = ttk.Frame(scrollable_frame, padding="20")
+ main_frame.pack(fill=tk.BOTH, expand=True)
+
+ # Title
+ ttk.Label(main_frame, text="Automatic Backup Settings",
+ font=('TkDefaultFont', 22, 'bold')).pack(pady=(0, 20))
+
+ # Backup toggle
+ backup_var = tk.BooleanVar(value=self.config.get('glossary_auto_backup', True))
+ backup_frame = ttk.Frame(main_frame)
+ backup_frame.pack(fill=tk.X, pady=5)
+
+ backup_check = ttk.Checkbutton(backup_frame,
+ text="Enable automatic backups before modifications",
+ variable=backup_var)
+ backup_check.pack(anchor=tk.W)
+
+ # Settings frame (indented)
+ settings_frame = ttk.Frame(main_frame)
+ settings_frame.pack(fill=tk.X, pady=(10, 0), padx=(20, 0))
+
+ # Max backups setting
+ max_backups_frame = ttk.Frame(settings_frame)
+ max_backups_frame.pack(fill=tk.X, pady=5)
+
+ ttk.Label(max_backups_frame, text="Maximum backups to keep:").pack(side=tk.LEFT, padx=(0, 10))
+ max_backups_var = tk.IntVar(value=self.config.get('glossary_max_backups', 50))
+ max_backups_spin = ttk.Spinbox(max_backups_frame, from_=0, to=999,
+ textvariable=max_backups_var, width=10)
+ max_backups_spin.pack(side=tk.LEFT)
+ ttk.Label(max_backups_frame, text="(0 = unlimited)",
+ font=('TkDefaultFont', 9),
+ foreground='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Backup naming pattern info
+ pattern_frame = ttk.Frame(settings_frame)
+ pattern_frame.pack(fill=tk.X, pady=(15, 5))
+
+ ttk.Label(pattern_frame, text="Backup naming pattern:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W)
+ ttk.Label(pattern_frame,
+ text="[original_name]_[operation]_[YYYYMMDD_HHMMSS].json",
+ font=('TkDefaultFont', 9, 'italic'),
+ foreground='#666').pack(anchor=tk.W, padx=(10, 0))
+
+ # Example
+ example_text = "Example: my_glossary_before_delete_5_20240115_143052.json"
+ ttk.Label(pattern_frame, text=example_text,
+ font=('TkDefaultFont', 8),
+ foreground='gray').pack(anchor=tk.W, padx=(10, 0), pady=(2, 0))
+
+ # Separator
+ ttk.Separator(main_frame, orient='horizontal').pack(fill=tk.X, pady=(20, 15))
+
+ # Backup location info
+ location_frame = ttk.Frame(main_frame)
+ location_frame.pack(fill=tk.X)
+
+ ttk.Label(location_frame, text="📁 Backup Location:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W)
+
+ if self.editor_file_var.get():
+ glossary_dir = os.path.dirname(self.editor_file_var.get())
+ backup_path = "Backups"
+ full_path = os.path.join(glossary_dir, "Backups")
+
+ path_label = ttk.Label(location_frame,
+ text=f"{backup_path}/",
+ font=('TkDefaultFont', 9),
+ foreground='#0066cc')
+ path_label.pack(anchor=tk.W, padx=(10, 0))
+
+ # Check if backup folder exists and show count
+ if os.path.exists(full_path):
+ backup_count = len([f for f in os.listdir(full_path) if f.endswith('.json')])
+ ttk.Label(location_frame,
+ text=f"Currently contains {backup_count} backup(s)",
+ font=('TkDefaultFont', 8),
+ foreground='gray').pack(anchor=tk.W, padx=(10, 0))
+ else:
+ ttk.Label(location_frame,
+ text="Backups",
+ font=('TkDefaultFont', 9),
+ foreground='gray').pack(anchor=tk.W, padx=(10, 0))
+
+ def toggle_settings_state(*args):
+ state = tk.NORMAL if backup_var.get() else tk.DISABLED
+ max_backups_spin.config(state=state)
+
+ backup_var.trace('w', toggle_settings_state)
+ toggle_settings_state() # Set initial state
+
+ # Buttons
+ button_frame = ttk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(25, 0))
+
+ # Inner frame for centering buttons
+ button_inner_frame = ttk.Frame(button_frame)
+ button_inner_frame.pack(anchor=tk.CENTER)
+
+ def save_settings():
+ # Save backup settings
+ self.config['glossary_auto_backup'] = backup_var.get()
+ self.config['glossary_max_backups'] = max_backups_var.get()
+
+ # Save to config file
+ with open(CONFIG_FILE, 'w', encoding='utf-8') as f:
+ json.dump(self.config, f, ensure_ascii=False, indent=2)
+
+ status = "enabled" if backup_var.get() else "disabled"
+ if backup_var.get():
+ limit = max_backups_var.get()
+ limit_text = "unlimited" if limit == 0 else f"max {limit}"
+ msg = f"Automatic backups {status} ({limit_text})"
+ else:
+ msg = f"Automatic backups {status}"
+
+ messagebox.showinfo("Success", msg)
+ dialog.destroy()
+
+ def create_manual_backup():
+ """Create a manual backup right now"""
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ if self.create_glossary_backup("manual"):
+ messagebox.showinfo("Success", "Manual backup created successfully!")
+
+ tb.Button(button_inner_frame, text="Save Settings", command=save_settings,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_inner_frame, text="Backup Now", command=create_manual_backup,
+ bootstyle="info", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_inner_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ # Auto-resize and show
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.45, max_height_ratio=0.41)
+
+ def smart_trim_dialog():
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ # Use WindowManager's setup_scrollable for unified scrolling
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Smart Trim Glossary",
+ width=600,
+ height=None,
+ max_width_ratio=0.9,
+ max_height_ratio=0.85
+ )
+
+ main_frame = scrollable_frame
+
+ # Title and description
+ tk.Label(main_frame, text="Smart Glossary Trimming",
+ font=('TkDefaultFont', 14, 'bold')).pack(pady=(20, 5))
+
+ tk.Label(main_frame, text="Limit the number of entries in your glossary",
+ font=('TkDefaultFont', 10), fg='gray', wraplength=550).pack(pady=(0, 15))
+
+ # Display current glossary stats
+ stats_frame = tk.LabelFrame(main_frame, text="Current Glossary Statistics", padx=15, pady=10)
+ stats_frame.pack(fill=tk.X, pady=(0, 15), padx=20)
+
+ entry_count = len(self.current_glossary_data) if self.current_glossary_format == 'list' else len(self.current_glossary_data.get('entries', {}))
+ tk.Label(stats_frame, text=f"Total entries: {entry_count}", font=('TkDefaultFont', 10)).pack(anchor=tk.W)
+
+ # For new format, show type breakdown
+ if self.current_glossary_format == 'list' and self.current_glossary_data and 'type' in self.current_glossary_data[0]:
+ characters = sum(1 for e in self.current_glossary_data if e.get('type') == 'character')
+ terms = sum(1 for e in self.current_glossary_data if e.get('type') == 'term')
+ tk.Label(stats_frame, text=f"Characters: {characters}, Terms: {terms}", font=('TkDefaultFont', 10)).pack(anchor=tk.W)
+
+ # Entry limit section
+ limit_frame = tk.LabelFrame(main_frame, text="Entry Limit", padx=15, pady=10)
+ limit_frame.pack(fill=tk.X, pady=(0, 15), padx=20)
+
+ tk.Label(limit_frame, text="Keep only the first N entries to reduce glossary size",
+ font=('TkDefaultFont', 9), fg='gray', wraplength=520).pack(anchor=tk.W, pady=(0, 10))
+
+ top_frame = tk.Frame(limit_frame)
+ top_frame.pack(fill=tk.X, pady=5)
+ tk.Label(top_frame, text="Keep first").pack(side=tk.LEFT)
+ top_var = tk.StringVar(value=str(min(100, entry_count)))
+ tb.Entry(top_frame, textvariable=top_var, width=10).pack(side=tk.LEFT, padx=5)
+ tk.Label(top_frame, text=f"entries (out of {entry_count})").pack(side=tk.LEFT)
+
+ # Preview section
+ preview_frame = tk.LabelFrame(main_frame, text="Preview", padx=15, pady=10)
+ preview_frame.pack(fill=tk.X, pady=(0, 15), padx=20)
+
+ preview_label = tk.Label(preview_frame, text="Click 'Preview Changes' to see the effect",
+ font=('TkDefaultFont', 10), fg='gray')
+ preview_label.pack(pady=5)
+
+ def preview_changes():
+ try:
+ top_n = int(top_var.get())
+ entries_to_remove = max(0, entry_count - top_n)
+
+ preview_text = f"Preview of changes:\n"
+ preview_text += f"• Entries: {entry_count} → {top_n} ({entries_to_remove} removed)\n"
+
+ preview_label.config(text=preview_text, fg='blue')
+
+ except ValueError:
+ preview_label.config(text="Please enter a valid number", fg='red')
+
+ tb.Button(preview_frame, text="Preview Changes", command=preview_changes,
+ bootstyle="info").pack()
+
+ # Action buttons
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(10, 20), padx=20)
+
+ def apply_smart_trim():
+ try:
+ top_n = int(top_var.get())
+
+ # Calculate how many entries will be removed
+ entries_to_remove = len(self.current_glossary_data) - top_n
+ if entries_to_remove > 0:
+ if not self.create_glossary_backup(f"before_trim_{entries_to_remove}"):
+ return
+
+ if self.current_glossary_format == 'list':
+ # Keep only top N entries
+ if top_n < len(self.current_glossary_data):
+ self.current_glossary_data = self.current_glossary_data[:top_n]
+
+ elif self.current_glossary_format == 'dict':
+ # For dict format, only support entry limit
+ entries = list(self.current_glossary_data['entries'].items())
+ if top_n < len(entries):
+ self.current_glossary_data['entries'] = dict(entries[:top_n])
+
+ if save_current_glossary():
+ load_glossary_for_editing()
+
+ messagebox.showinfo("Success", f"Trimmed glossary to {top_n} entries")
+ dialog.destroy()
+
+ except ValueError:
+ messagebox.showerror("Error", "Please enter valid numbers")
+
+ # Create inner frame for buttons
+ button_inner_frame = tk.Frame(button_frame)
+ button_inner_frame.pack()
+
+ tb.Button(button_inner_frame, text="Apply Trim", command=apply_smart_trim,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_inner_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ # Info section at bottom
+ info_frame = tk.Frame(main_frame)
+ info_frame.pack(fill=tk.X, pady=(0, 20), padx=20)
+
+ tk.Label(info_frame, text="💡 Tip: Entries are kept in their original order",
+ font=('TkDefaultFont', 9, 'italic'), fg='#666').pack()
+
+ # Auto-resize the dialog to fit content
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.9, max_height_ratio=1.2)
+
+ def filter_entries_dialog():
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ # Use WindowManager's setup_scrollable for unified scrolling
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Filter Entries",
+ width=600,
+ height=None,
+ max_width_ratio=0.9,
+ max_height_ratio=0.85
+ )
+
+ main_frame = scrollable_frame
+
+ # Title and description
+ tk.Label(main_frame, text="Filter Glossary Entries",
+ font=('TkDefaultFont', 14, 'bold')).pack(pady=(20, 5))
+
+ tk.Label(main_frame, text="Filter entries by type or content",
+ font=('TkDefaultFont', 10), fg='gray', wraplength=550).pack(pady=(0, 15))
+
+ # Current stats
+ entry_count = len(self.current_glossary_data) if self.current_glossary_format == 'list' else len(self.current_glossary_data.get('entries', {}))
+
+ stats_frame = tk.LabelFrame(main_frame, text="Current Status", padx=15, pady=10)
+ stats_frame.pack(fill=tk.X, pady=(0, 15), padx=20)
+ tk.Label(stats_frame, text=f"Total entries: {entry_count}", font=('TkDefaultFont', 10)).pack(anchor=tk.W)
+
+ # Check if new format
+ is_new_format = (self.current_glossary_format == 'list' and
+ self.current_glossary_data and
+ 'type' in self.current_glossary_data[0])
+
+ # Filter conditions
+ conditions_frame = tk.LabelFrame(main_frame, text="Filter Conditions", padx=15, pady=10)
+ conditions_frame.pack(fill=tk.BOTH, expand=True, pady=(0, 15), padx=20)
+
+ # Type filter for new format
+ type_vars = {}
+ if is_new_format:
+ type_frame = tk.LabelFrame(conditions_frame, text="Entry Type", padx=10, pady=10)
+ type_frame.pack(fill=tk.X, pady=(0, 10))
+
+ type_vars['character'] = tk.BooleanVar(value=True)
+ type_vars['term'] = tk.BooleanVar(value=True)
+
+ tb.Checkbutton(type_frame, text="Keep characters", variable=type_vars['character']).pack(anchor=tk.W)
+ tb.Checkbutton(type_frame, text="Keep terms/locations", variable=type_vars['term']).pack(anchor=tk.W)
+
+ # Text content filter
+ text_filter_frame = tk.LabelFrame(conditions_frame, text="Text Content Filter", padx=10, pady=10)
+ text_filter_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(text_filter_frame, text="Keep entries containing text (case-insensitive):",
+ font=('TkDefaultFont', 9), fg='gray').pack(anchor=tk.W, pady=(0, 5))
+
+ search_var = tk.StringVar()
+ tb.Entry(text_filter_frame, textvariable=search_var, width=40).pack(fill=tk.X, pady=5)
+
+ # Gender filter for new format
+ gender_var = tk.StringVar(value="all")
+ if is_new_format:
+ gender_frame = tk.LabelFrame(conditions_frame, text="Gender Filter (Characters Only)", padx=10, pady=10)
+ gender_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Radiobutton(gender_frame, text="All genders", variable=gender_var, value="all").pack(anchor=tk.W)
+ tk.Radiobutton(gender_frame, text="Male only", variable=gender_var, value="Male").pack(anchor=tk.W)
+ tk.Radiobutton(gender_frame, text="Female only", variable=gender_var, value="Female").pack(anchor=tk.W)
+ tk.Radiobutton(gender_frame, text="Unknown only", variable=gender_var, value="Unknown").pack(anchor=tk.W)
+
+ # Preview section
+ preview_frame = tk.LabelFrame(main_frame, text="Preview", padx=15, pady=10)
+ preview_frame.pack(fill=tk.X, pady=(0, 15), padx=20)
+
+ preview_label = tk.Label(preview_frame, text="Click 'Preview Filter' to see how many entries match",
+ font=('TkDefaultFont', 10), fg='gray')
+ preview_label.pack(pady=5)
+
+ def check_entry_matches(entry):
+ """Check if an entry matches the filter conditions"""
+ # Type filter
+ if is_new_format and entry.get('type'):
+ if not type_vars.get(entry['type'], tk.BooleanVar(value=True)).get():
+ return False
+
+ # Text filter
+ search_text = search_var.get().strip().lower()
+ if search_text:
+ # Search in all text fields
+ entry_text = ' '.join(str(v) for v in entry.values() if isinstance(v, str)).lower()
+ if search_text not in entry_text:
+ return False
+
+ # Gender filter
+ if is_new_format and gender_var.get() != "all":
+ if entry.get('type') == 'character' and entry.get('gender') != gender_var.get():
+ return False
+
+ return True
+
+ def preview_filter():
+ """Preview the filter results"""
+ matching = 0
+
+ if self.current_glossary_format == 'list':
+ for entry in self.current_glossary_data:
+ if check_entry_matches(entry):
+ matching += 1
+ else:
+ for key, entry in self.current_glossary_data.get('entries', {}).items():
+ if check_entry_matches(entry):
+ matching += 1
+
+ removed = entry_count - matching
+ preview_label.config(
+ text=f"Filter matches: {matching} entries ({removed} will be removed)",
+ fg='blue' if matching > 0 else 'red'
+ )
+
+ tb.Button(preview_frame, text="Preview Filter", command=preview_filter,
+ bootstyle="info").pack()
+
+ # Action buttons
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(10, 20), padx=20)
+
+ def apply_filter():
+ if self.current_glossary_format == 'list':
+ filtered = []
+ for entry in self.current_glossary_data:
+ if check_entry_matches(entry):
+ filtered.append(entry)
+
+ removed = len(self.current_glossary_data) - len(filtered)
+
+ if removed > 0:
+ if not self.create_glossary_backup(f"before_filter_remove_{removed}"):
+ return
+
+ self.current_glossary_data[:] = filtered
+
+ if save_current_glossary():
+ load_glossary_for_editing()
+ messagebox.showinfo("Success",
+ f"Filter applied!\n\nKept: {len(filtered)} entries\nRemoved: {removed} entries")
+ dialog.destroy()
+
+ # Create inner frame for buttons
+ button_inner_frame = tk.Frame(button_frame)
+ button_inner_frame.pack()
+
+ tb.Button(button_inner_frame, text="Apply Filter", command=apply_filter,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_inner_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ # Auto-resize the dialog to fit content
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.9, max_height_ratio=1.49)
+
+ def export_selection():
+ selected = self.glossary_tree.selection()
+ if not selected:
+ messagebox.showwarning("Warning", "No entries selected")
+ return
+
+ path = filedialog.asksaveasfilename(
+ title="Export Selected Entries",
+ defaultextension=".json",
+ filetypes=[("JSON files", "*.json"), ("CSV files", "*.csv")]
+ )
+
+ if not path:
+ return
+
+ try:
+ if self.current_glossary_format == 'list':
+ exported = []
+ for item in selected:
+ idx = int(self.glossary_tree.item(item)['text']) - 1
+ if 0 <= idx < len(self.current_glossary_data):
+ exported.append(self.current_glossary_data[idx])
+
+ if path.endswith('.csv'):
+ # Export as CSV
+ import csv
+ with open(path, 'w', encoding='utf-8', newline='') as f:
+ writer = csv.writer(f)
+ for entry in exported:
+ if entry.get('type') == 'character':
+ writer.writerow([entry.get('type', ''), entry.get('raw_name', ''),
+ entry.get('translated_name', ''), entry.get('gender', '')])
+ else:
+ writer.writerow([entry.get('type', ''), entry.get('raw_name', ''),
+ entry.get('translated_name', ''), ''])
+ else:
+ # Export as JSON
+ with open(path, 'w', encoding='utf-8') as f:
+ json.dump(exported, f, ensure_ascii=False, indent=2)
+
+ else:
+ exported = {}
+ entries_list = list(self.current_glossary_data.get('entries', {}).items())
+ for item in selected:
+ idx = int(self.glossary_tree.item(item)['text']) - 1
+ if 0 <= idx < len(entries_list):
+ key, value = entries_list[idx]
+ exported[key] = value
+
+ with open(path, 'w', encoding='utf-8') as f:
+ json.dump(exported, f, ensure_ascii=False, indent=2)
+
+ messagebox.showinfo("Success", f"Exported {len(selected)} entries to {os.path.basename(path)}")
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to export: {e}")
+
+ def save_edited_glossary():
+ if save_current_glossary():
+ messagebox.showinfo("Success", "Glossary saved successfully")
+ self.append_log(f"✅ Saved glossary to: {self.editor_file_var.get()}")
+
+ def save_as_glossary():
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ path = filedialog.asksaveasfilename(
+ title="Save Glossary As",
+ defaultextension=".json",
+ filetypes=[("JSON files", "*.json"), ("CSV files", "*.csv")]
+ )
+
+ if not path:
+ return
+
+ try:
+ if path.endswith('.csv'):
+ # Save as CSV
+ import csv
+ with open(path, 'w', encoding='utf-8', newline='') as f:
+ writer = csv.writer(f)
+ if self.current_glossary_format == 'list':
+ for entry in self.current_glossary_data:
+ if entry.get('type') == 'character':
+ writer.writerow([entry.get('type', ''), entry.get('raw_name', ''),
+ entry.get('translated_name', ''), entry.get('gender', '')])
+ else:
+ writer.writerow([entry.get('type', ''), entry.get('raw_name', ''),
+ entry.get('translated_name', ''), ''])
+ else:
+ # Save as JSON
+ with open(path, 'w', encoding='utf-8') as f:
+ json.dump(self.current_glossary_data, f, ensure_ascii=False, indent=2)
+
+ self.editor_file_var.set(path)
+ messagebox.showinfo("Success", f"Glossary saved to {os.path.basename(path)}")
+ self.append_log(f"✅ Saved glossary as: {path}")
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to save: {e}")
+
+ # Buttons
+ tb.Button(file_frame, text="Browse", command=browse_glossary, width=15).pack(side=tk.LEFT)
+
+
+ editor_controls = tk.Frame(container)
+ editor_controls.pack(fill=tk.X, pady=(10, 0))
+
+ # Row 1
+ row1 = tk.Frame(editor_controls)
+ row1.pack(fill=tk.X, pady=2)
+
+ buttons_row1 = [
+ ("Reload", load_glossary_for_editing, "info"),
+ ("Delete Selected", delete_selected_entries, "danger"),
+ ("Clean Empty Fields", clean_empty_fields, "warning"),
+ ("Remove Duplicates", remove_duplicates, "warning"),
+ ("Backup Settings", backup_settings_dialog, "success")
+ ]
+
+ for text, cmd, style in buttons_row1:
+ tb.Button(row1, text=text, command=cmd, bootstyle=style, width=15).pack(side=tk.LEFT, padx=2)
+
+ # Row 2
+ row2 = tk.Frame(editor_controls)
+ row2.pack(fill=tk.X, pady=2)
+
+ buttons_row2 = [
+ ("Trim Entries", smart_trim_dialog, "primary"),
+ ("Filter Entries", filter_entries_dialog, "primary"),
+ ("Convert Format", lambda: self.convert_glossary_format(load_glossary_for_editing), "info"),
+ ("Export Selection", export_selection, "secondary"),
+ ("About Format", duplicate_detection_settings, "info")
+ ]
+
+ for text, cmd, style in buttons_row2:
+ tb.Button(row2, text=text, command=cmd, bootstyle=style, width=15).pack(side=tk.LEFT, padx=2)
+
+ # Row 3
+ row3 = tk.Frame(editor_controls)
+ row3.pack(fill=tk.X, pady=2)
+
+ tb.Button(row3, text="Save Changes", command=save_edited_glossary,
+ bootstyle="success", width=20).pack(side=tk.LEFT, padx=2)
+ tb.Button(row3, text="Save As...", command=save_as_glossary,
+ bootstyle="success-outline", width=20).pack(side=tk.LEFT, padx=2)
+
+ def _on_tree_double_click(self, event):
+ """Handle double-click on treeview item for inline editing"""
+ region = self.glossary_tree.identify_region(event.x, event.y)
+ if region != 'cell':
+ return
+
+ item = self.glossary_tree.identify_row(event.y)
+ column = self.glossary_tree.identify_column(event.x)
+
+ if not item or column == '#0':
+ return
+
+ col_idx = int(column.replace('#', '')) - 1
+ columns = self.glossary_tree['columns']
+ if col_idx >= len(columns):
+ return
+
+ col_name = columns[col_idx]
+ values = self.glossary_tree.item(item)['values']
+ current_value = values[col_idx] if col_idx < len(values) else ''
+
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ f"Edit {col_name.replace('_', ' ').title()}",
+ width=400,
+ height=150
+ )
+
+ frame = tk.Frame(dialog, padx=20, pady=20)
+ frame.pack(fill=tk.BOTH, expand=True)
+
+ tk.Label(frame, text=f"Edit {col_name.replace('_', ' ').title()}:").pack(anchor=tk.W)
+
+ # Simple entry for new format fields
+ var = tk.StringVar(value=current_value)
+ entry = tb.Entry(frame, textvariable=var, width=50)
+ entry.pack(fill=tk.X, pady=5)
+ entry.focus()
+ entry.select_range(0, tk.END)
+
+ def save_edit():
+ new_value = var.get()
+
+ new_values = list(values)
+ new_values[col_idx] = new_value
+ self.glossary_tree.item(item, values=new_values)
+
+ row_idx = int(self.glossary_tree.item(item)['text']) - 1
+
+ if self.current_glossary_format == 'list':
+ if 0 <= row_idx < len(self.current_glossary_data):
+ entry = self.current_glossary_data[row_idx]
+
+ if new_value:
+ entry[col_name] = new_value
+ else:
+ entry.pop(col_name, None)
+
+ dialog.destroy()
+
+ button_frame = tk.Frame(frame)
+ button_frame.pack(fill=tk.X, pady=(10, 0))
+
+ tb.Button(button_frame, text="Save", command=save_edit,
+ bootstyle="success", width=10).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=10).pack(side=tk.LEFT, padx=5)
+
+ dialog.bind('', lambda e: save_edit())
+ dialog.bind('', lambda e: dialog.destroy())
+
+ dialog.deiconify()
+
+ def convert_glossary_format(self, reload_callback):
+ """Export glossary to CSV format"""
+ if not self.current_glossary_data:
+ messagebox.showerror("Error", "No glossary loaded")
+ return
+
+ # Create backup before conversion
+ if not self.create_glossary_backup("before_export"):
+ return
+
+ # Get current file path
+ current_path = self.editor_file_var.get()
+ default_csv_path = current_path.replace('.json', '.csv')
+
+ # Ask user for CSV save location
+ from tkinter import filedialog
+ csv_path = filedialog.asksaveasfilename(
+ title="Export Glossary to CSV",
+ defaultextension=".csv",
+ initialfile=os.path.basename(default_csv_path),
+ filetypes=[("CSV files", "*.csv"), ("All files", "*.*")]
+ )
+
+ if not csv_path:
+ return
+
+ try:
+ import csv
+
+ # Get custom types for gender info
+ custom_types = self.config.get('custom_entry_types', {
+ 'character': {'enabled': True, 'has_gender': True},
+ 'term': {'enabled': True, 'has_gender': False}
+ })
+
+ # Get custom fields
+ custom_fields = self.config.get('custom_glossary_fields', [])
+
+ with open(csv_path, 'w', encoding='utf-8', newline='') as f:
+ writer = csv.writer(f)
+
+ # Build header row
+ header = ['type', 'raw_name', 'translated_name', 'gender']
+ if custom_fields:
+ header.extend(custom_fields)
+
+ # Write header row
+ writer.writerow(header)
+
+ # Process based on format
+ if isinstance(self.current_glossary_data, list) and self.current_glossary_data:
+ if 'type' in self.current_glossary_data[0]:
+ # New format - direct export
+ for entry in self.current_glossary_data:
+ entry_type = entry.get('type', 'term')
+ type_config = custom_types.get(entry_type, {})
+
+ row = [
+ entry_type,
+ entry.get('raw_name', ''),
+ entry.get('translated_name', '')
+ ]
+
+ # Add gender
+ if type_config.get('has_gender', False):
+ row.append(entry.get('gender', ''))
+ else:
+ row.append('')
+
+ # Add custom field values
+ for field in custom_fields:
+ row.append(entry.get(field, ''))
+
+ writer.writerow(row)
+ else:
+ # Old format - convert then export
+ for entry in self.current_glossary_data:
+ # Determine type
+ is_location = False
+ if 'locations' in entry and entry['locations']:
+ is_location = True
+ elif 'title' in entry and any(term in str(entry.get('title', '')).lower()
+ for term in ['location', 'place', 'city', 'region']):
+ is_location = True
+
+ entry_type = 'term' if is_location else 'character'
+ type_config = custom_types.get(entry_type, {})
+
+ row = [
+ entry_type,
+ entry.get('original_name', entry.get('original', '')),
+ entry.get('name', entry.get('translated', ''))
+ ]
+
+ # Add gender
+ if type_config.get('has_gender', False):
+ row.append(entry.get('gender', 'Unknown'))
+ else:
+ row.append('')
+
+ # Add empty custom fields
+ for field in custom_fields:
+ row.append('')
+
+ writer.writerow(row)
+
+ messagebox.showinfo("Success", f"Glossary exported to CSV:\n{csv_path}")
+ self.append_log(f"✅ Exported glossary to: {csv_path}")
+
+ except Exception as e:
+ messagebox.showerror("Export Error", f"Failed to export CSV: {e}")
+ self.append_log(f"❌ CSV export failed: {e}")
+
+ def _make_bottom_toolbar(self):
+ """Create the bottom toolbar with all action buttons"""
+ btn_frame = tb.Frame(self.frame)
+ btn_frame.grid(row=11, column=0, columnspan=5, sticky=tk.EW, pady=5)
+
+ self.qa_button = tb.Button(btn_frame, text="QA Scan", command=self.run_qa_scan, bootstyle="warning")
+ self.qa_button.grid(row=0, column=99, sticky=tk.EW, padx=5)
+
+ toolbar_items = [
+ ("EPUB Converter", self.epub_converter, "info"),
+ ("Extract Glossary", self.run_glossary_extraction_thread, "warning"),
+ ("Glossary Manager", self.glossary_manager, "secondary"),
+ ]
+
+ # Add Manga Translator if available
+ if MANGA_SUPPORT:
+ toolbar_items.append(("Manga Translator", self.open_manga_translator, "primary"))
+
+ # Async Processing
+ toolbar_items.append(("Async Translation", self.open_async_processing, "success"))
+
+ toolbar_items.extend([
+ ("Retranslate", self.force_retranslation, "warning"),
+ ("Save Config", self.save_config, "secondary"),
+ ("Load Glossary", self.load_glossary, "secondary"),
+ ("Import Profiles", self.import_profiles, "secondary"),
+ ("Export Profiles", self.export_profiles, "secondary"),
+ ("📐 1080p: OFF", self.toggle_safe_ratios, "secondary"),
+ ])
+
+ for idx, (lbl, cmd, style) in enumerate(toolbar_items):
+ btn_frame.columnconfigure(idx, weight=1)
+ btn = tb.Button(btn_frame, text=lbl, command=cmd, bootstyle=style)
+ btn.grid(row=0, column=idx, sticky=tk.EW, padx=2)
+ if lbl == "Extract Glossary":
+ self.glossary_button = btn
+ elif lbl == "EPUB Converter":
+ self.epub_button = btn
+ elif "1080p" in lbl:
+ self.safe_ratios_btn = btn
+ elif lbl == "Async Processing (50% Off)":
+ self.async_button = btn
+
+ self.frame.grid_rowconfigure(12, weight=0)
+
+ def toggle_safe_ratios(self):
+ """Toggle 1080p Windows ratios mode"""
+ is_safe = self.wm.toggle_safe_ratios()
+
+ if is_safe:
+ self.safe_ratios_btn.config(
+ text="📐 1080p: ON",
+ bootstyle="success"
+ )
+ self.append_log("✅ 1080p Windows ratios enabled - all dialogs will fit on screen")
+ else:
+ self.safe_ratios_btn.config(
+ text="📐 1080p: OFF",
+ bootstyle="secondary"
+ )
+ self.append_log("❌ 1080p Windows ratios disabled - using default sizes")
+
+ # Save preference
+ self.config['force_safe_ratios'] = is_safe
+ self.save_config()
+
+ def _get_opf_file_order(self, file_list):
+ """
+ Sort files based on OPF spine order if available.
+ Uses STRICT OPF ordering - includes ALL files from spine without filtering.
+ This ensures notice files, copyright pages, etc. are processed in the correct order.
+ Returns sorted file list based on OPF, or original list if no OPF found.
+ """
+ try:
+ import xml.etree.ElementTree as ET
+ import zipfile
+ import re
+
+ # First, check if we have content.opf in the current directory
+ opf_file = None
+ if file_list:
+ current_dir = os.path.dirname(file_list[0]) if file_list else os.getcwd()
+ possible_opf = os.path.join(current_dir, 'content.opf')
+ if os.path.exists(possible_opf):
+ opf_file = possible_opf
+ self.append_log(f"📋 Found content.opf in directory")
+
+ # If no OPF, check if any of the files is an OPF
+ if not opf_file:
+ for file_path in file_list:
+ if file_path.lower().endswith('.opf'):
+ opf_file = file_path
+ self.append_log(f"📋 Found OPF file: {os.path.basename(opf_file)}")
+ break
+
+ # If no OPF, try to extract from EPUB
+ if not opf_file:
+ epub_files = [f for f in file_list if f.lower().endswith('.epub')]
+ if epub_files:
+ epub_path = epub_files[0]
+ try:
+ with zipfile.ZipFile(epub_path, 'r') as zf:
+ for name in zf.namelist():
+ if name.endswith('.opf'):
+ opf_content = zf.read(name)
+ temp_opf = os.path.join(os.path.dirname(epub_path), 'temp_content.opf')
+ with open(temp_opf, 'wb') as f:
+ f.write(opf_content)
+ opf_file = temp_opf
+ self.append_log(f"📋 Extracted OPF from EPUB: {os.path.basename(epub_path)}")
+ break
+ except Exception as e:
+ self.append_log(f"⚠️ Could not extract OPF from EPUB: {e}")
+
+ if not opf_file:
+ self.append_log(f"ℹ️ No OPF file found, using default file order")
+ return file_list
+
+ # Parse the OPF file
+ try:
+ tree = ET.parse(opf_file)
+ root = tree.getroot()
+
+ # Handle namespaces
+ ns = {'opf': 'http://www.idpf.org/2007/opf'}
+ if root.tag.startswith('{'):
+ default_ns = root.tag[1:root.tag.index('}')]
+ ns = {'opf': default_ns}
+
+ # Get manifest to map IDs to files
+ manifest = {}
+ for item in root.findall('.//opf:manifest/opf:item', ns):
+ item_id = item.get('id')
+ href = item.get('href')
+
+ if item_id and href:
+ filename = os.path.basename(href)
+ manifest[item_id] = filename
+ # Store multiple variations for matching
+ name_without_ext = os.path.splitext(filename)[0]
+ manifest[item_id + '_noext'] = name_without_ext
+ # Also store with response_ prefix for matching
+ manifest[item_id + '_response'] = f"response_{filename}"
+ manifest[item_id + '_response_noext'] = f"response_{name_without_ext}"
+
+ # Get spine order - include ALL files first for correct indexing
+ spine_order_full = []
+ spine = root.find('.//opf:spine', ns)
+ if spine is not None:
+ for itemref in spine.findall('opf:itemref', ns):
+ idref = itemref.get('idref')
+ if idref and idref in manifest:
+ spine_order_full.append(manifest[idref])
+
+ # Now filter out cover and nav/toc files for processing
+ spine_order = []
+ for item in spine_order_full:
+ # Skip navigation and cover files
+ if not any(skip in item.lower() for skip in ['nav.', 'toc.', 'cover.']):
+ spine_order.append(item)
+
+ self.append_log(f"📋 Found {len(spine_order_full)} items in OPF spine ({len(spine_order)} after filtering)")
+
+ # Count file types
+ notice_count = sum(1 for f in spine_order if 'notice' in f.lower())
+ chapter_count = sum(1 for f in spine_order if 'chapter' in f.lower() and 'notice' not in f.lower())
+ skipped_count = len(spine_order_full) - len(spine_order)
+
+ if skipped_count > 0:
+ self.append_log(f" • Skipped files (cover/nav/toc): {skipped_count}")
+ if notice_count > 0:
+ self.append_log(f" • Notice/Copyright files: {notice_count}")
+ if chapter_count > 0:
+ self.append_log(f" • Chapter files: {chapter_count}")
+
+ # Show first few spine entries
+ if spine_order:
+ self.append_log(f" 📖 Spine order preview:")
+ for i, entry in enumerate(spine_order[:5]):
+ self.append_log(f" [{i}]: {entry}")
+ if len(spine_order) > 5:
+ self.append_log(f" ... and {len(spine_order) - 5} more")
+
+ # Map input files to spine positions
+ ordered_files = []
+ unordered_files = []
+
+ for file_path in file_list:
+ basename = os.path.basename(file_path)
+ basename_noext = os.path.splitext(basename)[0]
+
+ # Try to find this file in the spine
+ found_position = None
+ matched_spine_file = None
+
+ # Direct exact match
+ if basename in spine_order:
+ found_position = spine_order.index(basename)
+ matched_spine_file = basename
+ # Match without extension
+ elif basename_noext in spine_order:
+ found_position = spine_order.index(basename_noext)
+ matched_spine_file = basename_noext
+ else:
+ # Try pattern matching for response_ files
+ for idx, spine_item in enumerate(spine_order):
+ spine_noext = os.path.splitext(spine_item)[0]
+
+ # Check if this is a response_ file matching spine item
+ if basename.startswith('response_'):
+ # Remove response_ prefix and try to match
+ clean_name = basename[9:] # Remove 'response_'
+ clean_noext = os.path.splitext(clean_name)[0]
+
+ if clean_name == spine_item or clean_noext == spine_noext:
+ found_position = idx
+ matched_spine_file = spine_item
+ break
+
+ # Try matching by chapter number
+ spine_num = re.search(r'(\d+)', spine_item)
+ file_num = re.search(r'(\d+)', clean_name)
+ if spine_num and file_num and spine_num.group(1) == file_num.group(1):
+ # Check if both are notice or both are chapter files
+ both_notice = 'notice' in spine_item.lower() and 'notice' in clean_name.lower()
+ both_chapter = 'chapter' in spine_item.lower() and 'chapter' in clean_name.lower()
+ if both_notice or both_chapter:
+ found_position = idx
+ matched_spine_file = spine_item
+ break
+ else:
+ # For non-response files, check if spine item is contained
+ if spine_noext in basename_noext:
+ found_position = idx
+ matched_spine_file = spine_item
+ break
+
+ # Number-based matching
+ spine_num = re.search(r'(\d+)', spine_item)
+ file_num = re.search(r'(\d+)', basename)
+ if spine_num and file_num and spine_num.group(1) == file_num.group(1):
+ # Check file type match
+ both_notice = 'notice' in spine_item.lower() and 'notice' in basename.lower()
+ both_chapter = 'chapter' in spine_item.lower() and 'chapter' in basename.lower()
+ if both_notice or both_chapter:
+ found_position = idx
+ matched_spine_file = spine_item
+ break
+
+ if found_position is not None:
+ ordered_files.append((found_position, file_path))
+ self.append_log(f" ✓ Matched: {basename} → spine[{found_position}]: {matched_spine_file}")
+ else:
+ unordered_files.append(file_path)
+ self.append_log(f" ⚠️ Not in spine: {basename}")
+
+ # Sort by spine position
+ ordered_files.sort(key=lambda x: x[0])
+ final_order = [f for _, f in ordered_files]
+
+ # Add unmapped files at the end
+ if unordered_files:
+ self.append_log(f"📋 Adding {len(unordered_files)} unmapped files at the end")
+ final_order.extend(sorted(unordered_files))
+
+ # Clean up temp OPF if created
+ if opf_file and 'temp_content.opf' in opf_file and os.path.exists(opf_file):
+ try:
+ os.remove(opf_file)
+ except:
+ pass
+
+ self.append_log(f"✅ Files sorted using STRICT OPF spine order")
+ self.append_log(f" • Total files: {len(final_order)}")
+ self.append_log(f" • Following exact spine sequence from OPF")
+
+ return final_order if final_order else file_list
+
+ except Exception as e:
+ self.append_log(f"⚠️ Error parsing OPF file: {e}")
+ if opf_file and 'temp_content.opf' in opf_file and os.path.exists(opf_file):
+ try:
+ os.remove(opf_file)
+ except:
+ pass
+ return file_list
+
+ except Exception as e:
+ self.append_log(f"⚠️ Error in OPF sorting: {e}")
+ return file_list
+
+ def run_translation_thread(self):
+ """Start translation in a background worker (ThreadPoolExecutor)"""
+ # Prevent overlap with glossary extraction
+ if (hasattr(self, 'glossary_thread') and self.glossary_thread and self.glossary_thread.is_alive()) or \
+ (hasattr(self, 'glossary_future') and self.glossary_future and not self.glossary_future.done()):
+ self.append_log("⚠️ Cannot run translation while glossary extraction is in progress.")
+ messagebox.showwarning("Process Running", "Please wait for glossary extraction to complete before starting translation.")
+ return
+
+ if self.translation_thread and self.translation_thread.is_alive():
+ self.stop_translation()
+ return
+
+ # Check if files are selected
+ if not hasattr(self, 'selected_files') or not self.selected_files:
+ file_path = self.entry_epub.get().strip()
+ if not file_path or file_path.startswith("No file selected") or "files selected" in file_path:
+ messagebox.showerror("Error", "Please select file(s) to translate.")
+ return
+ self.selected_files = [file_path]
+
+ # Reset stop flags
+ self.stop_requested = False
+ if translation_stop_flag:
+ translation_stop_flag(False)
+
+ # Also reset the module's internal stop flag
+ try:
+ if hasattr(self, '_main_module') and self._main_module:
+ if hasattr(self._main_module, 'set_stop_flag'):
+ self._main_module.set_stop_flag(False)
+ except:
+ pass
+
+ # Update button immediately to show translation is starting
+ if hasattr(self, 'button_run'):
+ self.button_run.config(text="⏹ Stop", state="normal")
+
+ # Show immediate feedback that translation is starting
+ self.append_log("🚀 Initializing translation process...")
+
+ # Start worker immediately - no heavy operations here
+ # IMPORTANT: Do NOT call _ensure_executor() here as it may be slow
+ # Just start the thread directly
+ thread_name = f"TranslationThread_{int(time.time())}"
+ self.translation_thread = threading.Thread(
+ target=self.run_translation_wrapper,
+ name=thread_name,
+ daemon=True
+ )
+ self.translation_thread.start()
+
+ # Schedule button update check
+ self.master.after(100, self.update_run_button)
+
+ def run_translation_wrapper(self):
+ """Wrapper that handles ALL initialization in background thread"""
+ try:
+ # Ensure executor is available (do this in background thread)
+ if not hasattr(self, 'executor') or self.executor is None:
+ try:
+ self._ensure_executor()
+ except Exception as e:
+ self.append_log(f"⚠️ Could not initialize executor: {e}")
+
+ # Load modules in background thread (not main thread!)
+ if not self._modules_loaded:
+ self.append_log("📦 Loading translation modules (this may take a moment)...")
+
+ # Create a progress callback that uses append_log
+ def module_progress(msg):
+ self.append_log(f" {msg}")
+
+ # Load modules with progress feedback
+ if not self._lazy_load_modules(splash_callback=module_progress):
+ self.append_log("❌ Failed to load required modules")
+ return
+
+ self.append_log("✅ Translation modules loaded successfully")
+
+ # Check for large EPUBs and set optimization parameters
+ epub_files = [f for f in self.selected_files if f.lower().endswith('.epub')]
+
+ for epub_path in epub_files:
+ try:
+ import zipfile
+ with zipfile.ZipFile(epub_path, 'r') as zf:
+ # Quick count without reading content
+ html_files = [f for f in zf.namelist() if f.lower().endswith(('.html', '.xhtml', '.htm'))]
+ file_count = len(html_files)
+
+ if file_count > 50:
+ self.append_log(f"📚 Large EPUB detected: {file_count} chapters")
+
+ # Get user-configured worker count
+ if hasattr(self, 'config') and 'extraction_workers' in self.config:
+ max_workers = self.config.get('extraction_workers', 2)
+ else:
+ # Fallback to environment variable or default
+ max_workers = int(os.environ.get('EXTRACTION_WORKERS', '2'))
+
+ # Set extraction parameters
+ os.environ['EXTRACTION_WORKERS'] = str(max_workers)
+ os.environ['EXTRACTION_PROGRESS_CALLBACK'] = 'enabled'
+
+ # Set progress interval based on file count
+ if file_count > 500:
+ progress_interval = 50
+ os.environ['EXTRACTION_BATCH_SIZE'] = '100'
+ self.append_log(f"⚡ Using {max_workers} workers with batch size 100")
+ elif file_count > 200:
+ progress_interval = 25
+ os.environ['EXTRACTION_BATCH_SIZE'] = '50'
+ self.append_log(f"⚡ Using {max_workers} workers with batch size 50")
+ elif file_count > 100:
+ progress_interval = 20
+ os.environ['EXTRACTION_BATCH_SIZE'] = '25'
+ self.append_log(f"⚡ Using {max_workers} workers with batch size 25")
+ else:
+ progress_interval = 10
+ os.environ['EXTRACTION_BATCH_SIZE'] = '20'
+ self.append_log(f"⚡ Using {max_workers} workers with batch size 20")
+
+ os.environ['EXTRACTION_PROGRESS_INTERVAL'] = str(progress_interval)
+
+ # Enable performance flags for large files
+ os.environ['FAST_EXTRACTION'] = '1'
+ os.environ['PARALLEL_PARSE'] = '1'
+
+ # For very large files, enable aggressive optimization
+ #if file_count > 300:
+ # os.environ['SKIP_VALIDATION'] = '1'
+ # os.environ['LAZY_LOAD_CONTENT'] = '1'
+ # self.append_log("🚀 Enabled aggressive optimization for very large file")
+
+ except Exception as e:
+ # If we can't check, just continue
+ pass
+
+ # Set essential environment variables from current config before translation
+ os.environ['BATCH_TRANSLATE_HEADERS'] = '1' if self.config.get('batch_translate_headers', False) else '0'
+ os.environ['IGNORE_HEADER'] = '1' if self.config.get('ignore_header', False) else '0'
+ os.environ['IGNORE_TITLE'] = '1' if self.config.get('ignore_title', False) else '0'
+
+ # Now run the actual translation
+ translation_completed = self.run_translation_direct()
+
+ # If scanning phase toggle is enabled, launch scanner after translation
+ # BUT only if translation completed successfully (not stopped by user)
+ try:
+ if (getattr(self, 'scan_phase_enabled_var', None) and self.scan_phase_enabled_var.get() and
+ translation_completed and not self.stop_requested):
+ mode = self.scan_phase_mode_var.get() if hasattr(self, 'scan_phase_mode_var') else 'quick-scan'
+ self.append_log(f"🧪 Scanning phase enabled — launching QA Scanner in {mode} mode (post-translation)...")
+ # Non-interactive: skip dialogs and use auto-search
+ self.master.after(0, lambda: self.run_qa_scan(mode_override=mode, non_interactive=True))
+ except Exception:
+ pass
+
+ except Exception as e:
+ self.append_log(f"❌ Translation error: {e}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ finally:
+ # Clean up environment variables
+ env_vars = [
+ 'EXTRACTION_WORKERS', 'EXTRACTION_BATCH_SIZE',
+ 'EXTRACTION_PROGRESS_CALLBACK', 'EXTRACTION_PROGRESS_INTERVAL',
+ 'FAST_EXTRACTION', 'PARALLEL_PARSE', 'SKIP_VALIDATION',
+ 'LAZY_LOAD_CONTENT'
+ ]
+ for var in env_vars:
+ if var in os.environ:
+ del os.environ[var]
+
+ # Update button state on main thread
+ self.master.after(0, self.update_run_button)
+
+ def run_translation_direct(self):
+ """Run translation directly - handles multiple files and different file types"""
+ try:
+ # Check stop at the very beginning
+ if self.stop_requested:
+ return False
+
+ # DON'T CALL _lazy_load_modules HERE!
+ # Modules are already loaded in the wrapper
+ # Just verify they're loaded
+ if not self._modules_loaded:
+ self.append_log("❌ Translation modules not loaded")
+ return False
+
+ # Check stop after verification
+ if self.stop_requested:
+ return False
+
+ # SET GLOSSARY IN ENVIRONMENT
+ if hasattr(self, 'manual_glossary_path') and self.manual_glossary_path:
+ os.environ['MANUAL_GLOSSARY'] = self.manual_glossary_path
+ self.append_log(f"📑 Set glossary in environment: {os.path.basename(self.manual_glossary_path)}")
+ else:
+ # Clear any previous glossary from environment
+ if 'MANUAL_GLOSSARY' in os.environ:
+ del os.environ['MANUAL_GLOSSARY']
+ self.append_log(f"ℹ️ No glossary loaded")
+
+ # ========== NEW: APPLY OPF-BASED SORTING ==========
+ # Sort files based on OPF order if available
+ original_file_count = len(self.selected_files)
+ self.selected_files = self._get_opf_file_order(self.selected_files)
+ self.append_log(f"📚 Processing {original_file_count} files in reading order")
+ # ====================================================
+
+ # Process each file
+ total_files = len(self.selected_files)
+ successful = 0
+ failed = 0
+
+ # Check if we're processing multiple images - if so, create a combined output folder
+ image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp'}
+ image_files = [f for f in self.selected_files if os.path.splitext(f)[1].lower() in image_extensions]
+
+ combined_image_output_dir = None
+ if len(image_files) > 1:
+ # Check stop before creating directories
+ if self.stop_requested:
+ return False
+
+ # Get the common parent directory name or use timestamp
+ parent_dir = os.path.dirname(self.selected_files[0])
+ folder_name = os.path.basename(parent_dir) if parent_dir else f"translated_images_{int(time.time())}"
+ combined_image_output_dir = folder_name
+ os.makedirs(combined_image_output_dir, exist_ok=True)
+
+ # Create images subdirectory for originals
+ images_dir = os.path.join(combined_image_output_dir, "images")
+ os.makedirs(images_dir, exist_ok=True)
+
+ self.append_log(f"📁 Created combined output directory: {combined_image_output_dir}")
+
+ for i, file_path in enumerate(self.selected_files):
+ if self.stop_requested:
+ self.append_log(f"⏹️ Translation stopped by user at file {i+1}/{total_files}")
+ break
+
+ self.current_file_index = i
+
+ # Log progress for multiple files
+ if total_files > 1:
+ self.append_log(f"\n{'='*60}")
+ self.append_log(f"📄 Processing file {i+1}/{total_files}: {os.path.basename(file_path)}")
+ progress_percent = ((i + 1) / total_files) * 100
+ self.append_log(f"📊 Overall progress: {progress_percent:.1f}%")
+ self.append_log(f"{'='*60}")
+
+ if not os.path.exists(file_path):
+ self.append_log(f"❌ File not found: {file_path}")
+ failed += 1
+ continue
+
+ # Determine file type and process accordingly
+ ext = os.path.splitext(file_path)[1].lower()
+
+ try:
+ if ext in image_extensions:
+ # Process as image with combined output directory if applicable
+ if self._process_image_file(file_path, combined_image_output_dir):
+ successful += 1
+ else:
+ failed += 1
+ elif ext in {'.epub', '.txt'}:
+ # Process as EPUB/TXT
+ if self._process_text_file(file_path):
+ successful += 1
+ else:
+ failed += 1
+ else:
+ self.append_log(f"⚠️ Unsupported file type: {ext}")
+ failed += 1
+
+ except Exception as e:
+ self.append_log(f"❌ Error processing {os.path.basename(file_path)}: {str(e)}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ failed += 1
+
+ # Check stop before final summary
+ if self.stop_requested:
+ self.append_log(f"\n⏹️ Translation stopped - processed {successful} of {total_files} files")
+ return False
+
+ # Final summary
+ if total_files > 1:
+ self.append_log(f"\n{'='*60}")
+ self.append_log(f"📊 Translation Summary:")
+ self.append_log(f" ✅ Successful: {successful} files")
+ if failed > 0:
+ self.append_log(f" ❌ Failed: {failed} files")
+ self.append_log(f" 📁 Total: {total_files} files")
+
+ if combined_image_output_dir and successful > 0:
+ self.append_log(f"\n💡 Tip: You can now compile the HTML files in '{combined_image_output_dir}' into an EPUB")
+
+ # Check for cover image
+ cover_found = False
+ for img_name in ['cover.png', 'cover.jpg', 'cover.jpeg', 'cover.webp']:
+ if os.path.exists(os.path.join(combined_image_output_dir, "images", img_name)):
+ self.append_log(f" 📖 Found cover image: {img_name}")
+ cover_found = True
+ break
+
+ if not cover_found:
+ # Use first image as cover
+ images_in_dir = os.listdir(os.path.join(combined_image_output_dir, "images"))
+ if images_in_dir:
+ self.append_log(f" 📖 First image will be used as cover: {images_in_dir[0]}")
+
+ self.append_log(f"{'='*60}")
+
+ return True # Translation completed successfully
+
+ except Exception as e:
+ self.append_log(f"❌ Translation setup error: {e}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ return False
+
+ finally:
+ self.stop_requested = False
+ if translation_stop_flag:
+ translation_stop_flag(False)
+
+ # Also reset the module's internal stop flag
+ try:
+ if hasattr(self, '_main_module') and self._main_module:
+ if hasattr(self._main_module, 'set_stop_flag'):
+ self._main_module.set_stop_flag(False)
+ except:
+ pass
+
+ self.translation_thread = None
+ self.current_file_index = 0
+ self.master.after(0, self.update_run_button)
+
+ def _process_image_file(self, image_path, combined_output_dir=None):
+ """Process a single image file using the direct image translation API with progress tracking"""
+ try:
+ import time
+ import shutil
+ import hashlib
+ import os
+ import json
+
+ # Determine output directory early for progress tracking
+ image_name = os.path.basename(image_path)
+ base_name = os.path.splitext(image_name)[0]
+
+ if combined_output_dir:
+ output_dir = combined_output_dir
+ else:
+ output_dir = base_name
+
+ # Initialize progress manager if not already done
+ if not hasattr(self, 'image_progress_manager'):
+ # Use the determined output directory
+ os.makedirs(output_dir, exist_ok=True)
+
+ # Import or define a simplified ImageProgressManager
+ class ImageProgressManager:
+ def __init__(self, output_dir=None):
+ self.output_dir = output_dir
+ if output_dir:
+ self.PROGRESS_FILE = os.path.join(output_dir, "translation_progress.json")
+ self.prog = self._init_or_load()
+ else:
+ self.PROGRESS_FILE = None
+ self.prog = {"images": {}, "content_hashes": {}, "version": "1.0"}
+
+ def set_output_dir(self, output_dir):
+ """Set or update the output directory and load progress"""
+ self.output_dir = output_dir
+ self.PROGRESS_FILE = os.path.join(output_dir, "translation_progress.json")
+ self.prog = self._init_or_load()
+
+ def _init_or_load(self):
+ """Initialize or load progress tracking"""
+ if os.path.exists(self.PROGRESS_FILE):
+ try:
+ with open(self.PROGRESS_FILE, "r", encoding="utf-8") as pf:
+ return json.load(pf)
+ except Exception as e:
+ if hasattr(self, 'append_log'):
+ self.append_log(f"⚠️ Creating new progress file due to error: {e}")
+ return {"images": {}, "content_hashes": {}, "version": "1.0"}
+ else:
+ return {"images": {}, "content_hashes": {}, "version": "1.0"}
+
+ def save(self):
+ """Save progress to file atomically"""
+ if not self.PROGRESS_FILE:
+ return
+ try:
+ # Ensure directory exists
+ os.makedirs(os.path.dirname(self.PROGRESS_FILE), exist_ok=True)
+
+ temp_file = self.PROGRESS_FILE + '.tmp'
+ with open(temp_file, "w", encoding="utf-8") as pf:
+ json.dump(self.prog, pf, ensure_ascii=False, indent=2)
+
+ if os.path.exists(self.PROGRESS_FILE):
+ os.remove(self.PROGRESS_FILE)
+ os.rename(temp_file, self.PROGRESS_FILE)
+ except Exception as e:
+ if hasattr(self, 'append_log'):
+ self.append_log(f"⚠️ Failed to save progress: {e}")
+ else:
+ print(f"⚠️ Failed to save progress: {e}")
+
+ def get_content_hash(self, file_path):
+ """Generate content hash for a file"""
+ hasher = hashlib.sha256()
+ with open(file_path, 'rb') as f:
+ # Read in chunks to handle large files
+ for chunk in iter(lambda: f.read(4096), b""):
+ hasher.update(chunk)
+ return hasher.hexdigest()
+
+ def check_image_status(self, image_path, content_hash):
+ """Check if an image needs translation"""
+ image_name = os.path.basename(image_path)
+
+ # NEW: Check for skip markers created by "Mark as Skipped" button
+ skip_key = f"skip_{image_name}"
+ if skip_key in self.prog:
+ skip_info = self.prog[skip_key]
+ if skip_info.get('status') == 'skipped':
+ return False, f"Image marked as skipped", None
+
+ # NEW: Check if image already exists in images folder (marked as skipped)
+ if self.output_dir:
+ images_dir = os.path.join(self.output_dir, "images")
+ dest_image_path = os.path.join(images_dir, image_name)
+
+ if os.path.exists(dest_image_path):
+ return False, f"Image in skipped folder", None
+
+ # Check if image has already been processed
+ if content_hash in self.prog["images"]:
+ image_info = self.prog["images"][content_hash]
+ status = image_info.get("status")
+ output_file = image_info.get("output_file")
+
+ if status == "completed" and output_file:
+ # Check if output file exists
+ if output_file and os.path.exists(output_file):
+ return False, f"Image already translated: {output_file}", output_file
+ else:
+ # Output file missing, mark for retranslation
+ image_info["status"] = "file_deleted"
+ image_info["deletion_detected"] = time.time()
+ self.save()
+ return True, None, None
+
+ elif status == "skipped_cover":
+ return False, "Cover image - skipped", None
+
+ elif status == "error":
+ # Previous error, retry
+ return True, None, None
+
+ return True, None, None
+
+ def update(self, image_path, content_hash, output_file=None, status="in_progress", error=None):
+ """Update progress for an image"""
+ image_name = os.path.basename(image_path)
+
+ image_info = {
+ "name": image_name,
+ "path": image_path,
+ "content_hash": content_hash,
+ "status": status,
+ "last_updated": time.time()
+ }
+
+ if output_file:
+ image_info["output_file"] = output_file
+
+ if error:
+ image_info["error"] = str(error)
+
+ self.prog["images"][content_hash] = image_info
+
+ # Update content hash index for duplicates
+ if status == "completed" and output_file:
+ self.prog["content_hashes"][content_hash] = {
+ "original_name": image_name,
+ "output_file": output_file
+ }
+
+ self.save()
+
+ # Initialize the progress manager
+ self.image_progress_manager = ImageProgressManager(output_dir)
+ # Add append_log reference for the progress manager
+ self.image_progress_manager.append_log = self.append_log
+ self.append_log(f"📊 Progress tracking in: {os.path.join(output_dir, 'translation_progress.json')}")
+
+ # Check for stop request early
+ if self.stop_requested:
+ self.append_log("⏹️ Image translation cancelled by user")
+ return False
+
+ # Get content hash for the image
+ try:
+ content_hash = self.image_progress_manager.get_content_hash(image_path)
+ except Exception as e:
+ self.append_log(f"⚠️ Could not generate content hash: {e}")
+ # Fallback to using file path as identifier
+ content_hash = hashlib.sha256(image_path.encode()).hexdigest()
+
+ # Check if image needs translation
+ needs_translation, skip_reason, existing_output = self.image_progress_manager.check_image_status(
+ image_path, content_hash
+ )
+
+ if not needs_translation:
+ self.append_log(f"⏭️ {skip_reason}")
+
+ # NEW: If image is marked as skipped but not in images folder yet, copy it there
+ if "marked as skipped" in skip_reason and combined_output_dir:
+ images_dir = os.path.join(combined_output_dir, "images")
+ os.makedirs(images_dir, exist_ok=True)
+ dest_image = os.path.join(images_dir, image_name)
+ if not os.path.exists(dest_image):
+ shutil.copy2(image_path, dest_image)
+ self.append_log(f"📁 Copied skipped image to: {dest_image}")
+
+ return True
+
+ # Update progress to "in_progress"
+ self.image_progress_manager.update(image_path, content_hash, status="in_progress")
+
+ # Check if image translation is enabled
+ if not hasattr(self, 'enable_image_translation_var') or not self.enable_image_translation_var.get():
+ self.append_log(f"⚠️ Image translation not enabled. Enable it in settings to translate images.")
+ return False
+
+ # Check for cover images
+ if 'cover' in image_name.lower():
+ self.append_log(f"⏭️ Skipping cover image: {image_name}")
+
+ # Update progress for cover
+ self.image_progress_manager.update(image_path, content_hash, status="skipped_cover")
+
+ # Copy cover image to images folder if using combined output
+ if combined_output_dir:
+ images_dir = os.path.join(combined_output_dir, "images")
+ os.makedirs(images_dir, exist_ok=True)
+ dest_image = os.path.join(images_dir, image_name)
+ if not os.path.exists(dest_image):
+ shutil.copy2(image_path, dest_image)
+ self.append_log(f"📁 Copied cover to: {dest_image}")
+
+ return True # Return True to indicate successful skip (not an error)
+
+ # Check for stop before processing
+ if self.stop_requested:
+ self.append_log("⏹️ Image translation cancelled before processing")
+ self.image_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+
+ # Get the file index for numbering
+ file_index = getattr(self, 'current_file_index', 0) + 1
+
+ # Get API key and model
+ api_key = self.api_key_entry.get().strip()
+ model = self.model_var.get().strip()
+
+ if not api_key:
+ self.append_log("❌ Error: Please enter your API key.")
+ self.image_progress_manager.update(image_path, content_hash, status="error", error="No API key")
+ return False
+
+ if not model:
+ self.append_log("❌ Error: Please select a model.")
+ self.image_progress_manager.update(image_path, content_hash, status="error", error="No model selected")
+ return False
+
+ self.append_log(f"🖼️ Processing image: {os.path.basename(image_path)}")
+ self.append_log(f"🤖 Using model: {model}")
+
+ # Check if it's a vision-capable model
+ vision_models = [
+ 'claude-opus-4-20250514', 'claude-sonnet-4-20250514',
+ 'gpt-4-turbo', 'gpt-4o', 'gpt-4o-mini', 'gpt-4.1', 'gpt-4.1-mini', 'gpt-5-mini','gpt-5','gpt-5-nano',
+ 'gpt-4-vision-preview',
+ 'gemini-1.5-pro', 'gemini-1.5-flash', 'gemini-2.0-flash', 'gemini-2.0-flash-exp',
+ 'gemini-2.5-pro', 'gemini-2.5-flash',
+ 'llama-3.2-11b-vision', 'llama-3.2-90b-vision',
+ 'eh/gemini-2.5-flash', 'eh/gemini-1.5-flash', 'eh/gpt-4o' # ElectronHub variants
+ ]
+
+ model_lower = model.lower()
+ if not any(vm in model_lower for vm in [m.lower() for m in vision_models]):
+ self.append_log(f"⚠️ Model '{model}' may not support vision. Trying anyway...")
+
+ # Check for stop before API initialization
+ if self.stop_requested:
+ self.append_log("⏹️ Image translation cancelled before API initialization")
+ self.image_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+
+ # Initialize API client
+ try:
+ from unified_api_client import UnifiedClient
+ client = UnifiedClient(model=model, api_key=api_key)
+
+ # Set stop flag if the client supports it
+ if hasattr(client, 'set_stop_flag'):
+ client.set_stop_flag(self.stop_requested)
+ elif hasattr(client, 'stop_flag'):
+ client.stop_flag = self.stop_requested
+
+ except Exception as e:
+ self.append_log(f"❌ Failed to initialize API client: {str(e)}")
+ self.image_progress_manager.update(image_path, content_hash, status="error", error=f"API client init failed: {e}")
+ return False
+
+ # Read the image
+ try:
+ # Get image name for payload naming
+ base_name = os.path.splitext(image_name)[0]
+
+ with open(image_path, 'rb') as img_file:
+ image_data = img_file.read()
+
+ # Convert to base64
+ import base64
+ image_base64 = base64.b64encode(image_data).decode('utf-8')
+
+ # Check image size
+ size_mb = len(image_data) / (1024 * 1024)
+ self.append_log(f"📊 Image size: {size_mb:.2f} MB")
+
+ except Exception as e:
+ self.append_log(f"❌ Failed to read image: {str(e)}")
+ self.image_progress_manager.update(image_path, content_hash, status="error", error=f"Failed to read image: {e}")
+ return False
+
+ # Get system prompt from configuration
+ profile_name = self.config.get('active_profile', 'korean')
+ prompt_profiles = self.config.get('prompt_profiles', {})
+
+ # Get the main translation prompt
+ system_prompt = ""
+ if isinstance(prompt_profiles, dict) and profile_name in prompt_profiles:
+ profile_data = prompt_profiles[profile_name]
+ if isinstance(profile_data, str):
+ # Old format: prompt_profiles[profile_name] = "prompt text"
+ system_prompt = profile_data
+ elif isinstance(profile_data, dict):
+ # New format: prompt_profiles[profile_name] = {"prompt": "...", "book_title_prompt": "..."}
+ system_prompt = profile_data.get('prompt', '')
+ else:
+ # Fallback to check if prompt is stored directly in config
+ system_prompt = self.config.get(profile_name, '')
+
+ if not system_prompt:
+ # Last fallback - empty string
+ system_prompt = ""
+
+ # Check if we should append glossary to the prompt
+ append_glossary = self.config.get('append_glossary', True) # Default to True
+ if hasattr(self, 'append_glossary_var'):
+ append_glossary = self.append_glossary_var.get()
+
+ # Check if automatic glossary is enabled
+ enable_auto_glossary = self.config.get('enable_auto_glossary', False)
+ if hasattr(self, 'enable_auto_glossary_var'):
+ enable_auto_glossary = self.enable_auto_glossary_var.get()
+
+ if append_glossary:
+ # Check for manual glossary
+ manual_glossary_path = os.getenv('MANUAL_GLOSSARY')
+ if not manual_glossary_path and hasattr(self, 'manual_glossary_path'):
+ manual_glossary_path = self.manual_glossary_path
+
+ # If automatic glossary is enabled and no manual glossary exists, defer appending
+ if enable_auto_glossary and (not manual_glossary_path or not os.path.exists(manual_glossary_path)):
+ self.append_log(f"📑 Automatic glossary enabled - glossary will be appended after generation")
+ # Set a flag to indicate deferred glossary appending
+ os.environ['DEFER_GLOSSARY_APPEND'] = '1'
+ # Store the append prompt for later use
+ glossary_prompt = self.config.get('append_glossary_prompt',
+ "- Follow this reference glossary for consistent translation (Do not output any raw entries):\n")
+ os.environ['GLOSSARY_APPEND_PROMPT'] = glossary_prompt
+ else:
+ # Original behavior - append manual glossary immediately
+ if manual_glossary_path and os.path.exists(manual_glossary_path):
+ try:
+ self.append_log(f"📑 Loading glossary for system prompt: {os.path.basename(manual_glossary_path)}")
+
+ # Copy to output as the same extension, and prefer CSV naming
+ ext = os.path.splitext(manual_glossary_path)[1].lower()
+ out_name = "glossary.csv" if ext == ".csv" else "glossary.json"
+ output_glossary_path = os.path.join(output_dir, out_name)
+ try:
+ import shutil as _shutil
+ _shutil.copy(manual_glossary_path, output_glossary_path)
+ self.append_log(f"💾 Saved glossary to output folder for auto-loading: {out_name}")
+ except Exception as copy_err:
+ self.append_log(f"⚠️ Could not copy glossary into output: {copy_err}")
+
+ # Append to prompt
+ if ext == ".csv":
+ with open(manual_glossary_path, 'r', encoding='utf-8') as f:
+ csv_text = f.read()
+ if system_prompt:
+ system_prompt += "\n\n"
+ glossary_prompt = self.config.get('append_glossary_prompt',
+ "- Follow this reference glossary for consistent translation (Do not output any raw entries):\n")
+ system_prompt += f"{glossary_prompt}\n{csv_text}"
+ self.append_log(f"✅ Appended CSV glossary to system prompt")
+ else:
+ with open(manual_glossary_path, 'r', encoding='utf-8') as f:
+ glossary_data = json.load(f)
+
+ formatted_entries = {}
+ if isinstance(glossary_data, list):
+ for char in glossary_data:
+ if not isinstance(char, dict):
+ continue
+ original = char.get('original_name', '')
+ translated = char.get('name', original)
+ if original and translated:
+ formatted_entries[original] = translated
+ title = char.get('title')
+ if title and original:
+ formatted_entries[f"{original} ({title})"] = f"{translated} ({title})"
+ refer_map = char.get('how_they_refer_to_others', {})
+ if isinstance(refer_map, dict):
+ for other_name, reference in refer_map.items():
+ if other_name and reference:
+ formatted_entries[f"{original} → {other_name}"] = f"{translated} → {reference}"
+ elif isinstance(glossary_data, dict):
+ if "entries" in glossary_data and isinstance(glossary_data["entries"], dict):
+ formatted_entries = glossary_data["entries"]
+ else:
+ formatted_entries = {k: v for k, v in glossary_data.items() if k != "metadata"}
+ if formatted_entries:
+ glossary_block = json.dumps(formatted_entries, ensure_ascii=False, indent=2)
+ if system_prompt:
+ system_prompt += "\n\n"
+ glossary_prompt = self.config.get('append_glossary_prompt',
+ "- Follow this reference glossary for consistent translation (Do not output any raw entries):\n")
+ system_prompt += f"{glossary_prompt}\n{glossary_block}"
+ self.append_log(f"✅ Added {len(formatted_entries)} glossary entries to system prompt")
+ else:
+ self.append_log(f"⚠️ Glossary file has no valid entries")
+
+ except Exception as e:
+ self.append_log(f"⚠️ Failed to append glossary to prompt: {str(e)}")
+ else:
+ self.append_log(f"ℹ️ No glossary file found to append to prompt")
+ else:
+ self.append_log(f"ℹ️ Glossary appending disabled in settings")
+ # Clear any deferred append flag
+ if 'DEFER_GLOSSARY_APPEND' in os.environ:
+ del os.environ['DEFER_GLOSSARY_APPEND']
+
+ # Get temperature and max tokens from GUI
+ temperature = float(self.temperature_entry.get()) if hasattr(self, 'temperature_entry') else 0.3
+ max_tokens = int(self.max_output_tokens_var.get()) if hasattr(self, 'max_output_tokens_var') else 8192
+
+ # Build messages for vision API
+ messages = [
+ {"role": "system", "content": system_prompt}
+ ]
+
+ self.append_log(f"🌐 Sending image to vision API...")
+ self.append_log(f" System prompt length: {len(system_prompt)} chars")
+ self.append_log(f" Temperature: {temperature}")
+ self.append_log(f" Max tokens: {max_tokens}")
+
+ # Debug: Show first 200 chars of system prompt
+ if system_prompt:
+ preview = system_prompt[:200] + "..." if len(system_prompt) > 200 else system_prompt
+ self.append_log(f" System prompt preview: {preview}")
+
+ # Check stop before making API call
+ if self.stop_requested:
+ self.append_log("⏹️ Image translation cancelled before API call")
+ self.image_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+
+ # Make the API call
+ try:
+ # Create Payloads directory for API response tracking
+ payloads_dir = "Payloads"
+ os.makedirs(payloads_dir, exist_ok=True)
+
+ # Create timestamp for unique filename
+ timestamp = time.strftime("%Y%m%d_%H%M%S")
+ payload_file = os.path.join(payloads_dir, f"image_api_{timestamp}_{base_name}.json")
+
+ # Save the request payload
+ request_payload = {
+ "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
+ "model": model,
+ "image_file": image_name,
+ "image_size_mb": size_mb,
+ "temperature": temperature,
+ "max_tokens": max_tokens,
+ "messages": messages,
+ "image_base64": image_base64 # Full payload without truncation
+ }
+
+ with open(payload_file, 'w', encoding='utf-8') as f:
+ json.dump(request_payload, f, ensure_ascii=False, indent=2)
+
+ self.append_log(f"📝 Saved request payload: {payload_file}")
+
+ # Call the vision API with interrupt support
+ # Check if the client supports a stop_callback parameter
+ # Import the send_with_interrupt function from TransateKRtoEN
+ try:
+ from TransateKRtoEN import send_with_interrupt
+ except ImportError:
+ self.append_log("⚠️ send_with_interrupt not available, using direct call")
+ send_with_interrupt = None
+
+ # Call the vision API with interrupt support
+ if send_with_interrupt:
+ # For image calls, we need a wrapper since send_with_interrupt expects client.send()
+ # Create a temporary wrapper client that handles image calls
+ class ImageClientWrapper:
+ def __init__(self, real_client, image_data):
+ self.real_client = real_client
+ self.image_data = image_data
+
+ def send(self, messages, temperature, max_tokens):
+ return self.real_client.send_image(messages, self.image_data, temperature=temperature, max_tokens=max_tokens)
+
+ def __getattr__(self, name):
+ return getattr(self.real_client, name)
+
+ # Create wrapped client
+ wrapped_client = ImageClientWrapper(client, image_base64)
+
+ # Use send_with_interrupt
+ response = send_with_interrupt(
+ messages,
+ wrapped_client,
+ temperature,
+ max_tokens,
+ lambda: self.stop_requested,
+ chunk_timeout=self.config.get('chunk_timeout', 300) # 5 min default
+ )
+ else:
+ # Fallback to direct call
+ response = client.send_image(
+ messages,
+ image_base64,
+ temperature=temperature,
+ max_tokens=max_tokens
+ )
+
+ # Check if stopped after API call
+ if self.stop_requested:
+ self.append_log("⏹️ Image translation stopped after API call")
+ self.image_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+
+ # Extract content and finish reason from response
+ response_content = None
+ finish_reason = None
+
+ if hasattr(response, 'content'):
+ response_content = response.content
+ finish_reason = response.finish_reason if hasattr(response, 'finish_reason') else 'unknown'
+ elif isinstance(response, tuple) and len(response) >= 2:
+ # Handle tuple response (content, finish_reason)
+ response_content, finish_reason = response
+ elif isinstance(response, str):
+ # Handle direct string response
+ response_content = response
+ finish_reason = 'complete'
+ else:
+ self.append_log(f"❌ Unexpected response type: {type(response)}")
+ self.append_log(f" Response: {response}")
+
+ # Save the response payload
+ response_payload = {
+ "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
+ "response_content": response_content,
+ "finish_reason": finish_reason,
+ "content_length": len(response_content) if response_content else 0
+ }
+
+ response_file = os.path.join(payloads_dir, f"image_api_response_{timestamp}_{base_name}.json")
+ with open(response_file, 'w', encoding='utf-8') as f:
+ json.dump(response_payload, f, ensure_ascii=False, indent=2)
+
+ self.append_log(f"📝 Saved response payload: {response_file}")
+
+ # Check if we got valid content
+ if not response_content or response_content.strip() == "[IMAGE TRANSLATION FAILED]":
+ self.append_log(f"❌ Image translation failed - no text extracted from image")
+ self.append_log(f" This may mean:")
+ self.append_log(f" - The image doesn't contain readable text")
+ self.append_log(f" - The model couldn't process the image")
+ self.append_log(f" - The image format is not supported")
+
+ # Try to get more info about the failure
+ if hasattr(response, 'error_details'):
+ self.append_log(f" Error details: {response.error_details}")
+
+ self.image_progress_manager.update(image_path, content_hash, status="error", error="No text extracted")
+ return False
+
+ if response_content:
+ self.append_log(f"✅ Received translation from API")
+
+ # We already have output_dir defined at the top
+ # Copy original image to the output directory if not using combined output
+ if not combined_output_dir and not os.path.exists(os.path.join(output_dir, image_name)):
+ shutil.copy2(image_path, os.path.join(output_dir, image_name))
+
+ # Get book title prompt for translating the filename
+ book_title_prompt = self.config.get('book_title_prompt', '')
+ book_title_system_prompt = self.config.get('book_title_system_prompt', '')
+
+ # If no book title prompt in main config, check in profile
+ if not book_title_prompt and isinstance(prompt_profiles, dict) and profile_name in prompt_profiles:
+ profile_data = prompt_profiles[profile_name]
+ if isinstance(profile_data, dict):
+ book_title_prompt = profile_data.get('book_title_prompt', '')
+ # Also check for system prompt in profile
+ if 'book_title_system_prompt' in profile_data:
+ book_title_system_prompt = profile_data['book_title_system_prompt']
+
+ # If still no book title prompt, use the main system prompt
+ if not book_title_prompt:
+ book_title_prompt = system_prompt
+
+ # If no book title system prompt configured, use the main system prompt
+ if not book_title_system_prompt:
+ book_title_system_prompt = system_prompt
+
+ # Translate the image filename/title
+ self.append_log(f"📝 Translating image title...")
+ title_messages = [
+ {"role": "system", "content": book_title_system_prompt},
+ {"role": "user", "content": f"{book_title_prompt}\n\n{base_name}" if book_title_prompt != system_prompt else base_name}
+ ]
+
+ try:
+ # Check for stop before title translation
+ if self.stop_requested:
+ self.append_log("⏹️ Image translation cancelled before title translation")
+ self.image_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+
+ title_response = client.send(
+ title_messages,
+ temperature=temperature,
+ max_tokens=max_tokens
+ )
+
+ # Extract title translation
+ if hasattr(title_response, 'content'):
+ translated_title = title_response.content.strip() if title_response.content else base_name
+ else:
+ # Handle tuple response
+ title_content, *_ = title_response
+ translated_title = title_content.strip() if title_content else base_name
+ except Exception as e:
+ self.append_log(f"⚠️ Title translation failed: {str(e)}")
+ translated_title = base_name # Fallback to original if translation fails
+
+ # Create clean HTML content with just the translated title and content
+ html_content = f'''
+
+
+
+ {translated_title}
+
+
+
+ {translated_title}
+ {response_content}
+
+ '''
+
+ # Save HTML file with proper numbering
+ html_file = os.path.join(output_dir, f"response_{file_index:03d}_{base_name}.html")
+ with open(html_file, 'w', encoding='utf-8') as f:
+ f.write(html_content)
+
+ # Copy original image to the output directory (for reference, not displayed)
+ if not combined_output_dir:
+ shutil.copy2(image_path, os.path.join(output_dir, image_name))
+
+ # Update progress to completed
+ self.image_progress_manager.update(image_path, content_hash, output_file=html_file, status="completed")
+
+ # Show preview
+ if response_content and response_content.strip():
+ preview = response_content[:200] + "..." if len(response_content) > 200 else response_content
+ self.append_log(f"📝 Translation preview:")
+ self.append_log(f"{preview}")
+ else:
+ self.append_log(f"⚠️ Translation appears to be empty")
+
+ self.append_log(f"✅ Translation saved to: {html_file}")
+ self.append_log(f"📁 Output directory: {output_dir}")
+
+ return True
+ else:
+ self.append_log(f"❌ No translation received from API")
+ if finish_reason:
+ self.append_log(f" Finish reason: {finish_reason}")
+ self.image_progress_manager.update(image_path, content_hash, status="error", error="No response from API")
+ return False
+
+ except Exception as e:
+ # Check if this was a stop/interrupt exception
+ if "stop" in str(e).lower() or "interrupt" in str(e).lower() or self.stop_requested:
+ self.append_log("⏹️ Image translation interrupted")
+ self.image_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+ else:
+ self.append_log(f"❌ API call failed: {str(e)}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ self.image_progress_manager.update(image_path, content_hash, status="error", error=f"API call failed: {e}")
+ return False
+
+ except Exception as e:
+ self.append_log(f"❌ Error processing image: {str(e)}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ return False
+
+ def _process_text_file(self, file_path):
+ """Process EPUB or TXT file (existing translation logic)"""
+ try:
+ if translation_main is None:
+ self.append_log("❌ Translation module is not available")
+ return False
+
+ api_key = self.api_key_entry.get()
+ model = self.model_var.get()
+
+ # Validate API key and model (same as original)
+ if '@' in model or model.startswith('vertex/'):
+ google_creds = self.config.get('google_cloud_credentials')
+ if not google_creds or not os.path.exists(google_creds):
+ self.append_log("❌ Error: Google Cloud credentials required for Vertex AI models.")
+ return False
+
+ os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = google_creds
+ self.append_log(f"🔑 Using Google Cloud credentials: {os.path.basename(google_creds)}")
+
+ if not api_key:
+ try:
+ with open(google_creds, 'r') as f:
+ creds_data = json.load(f)
+ api_key = creds_data.get('project_id', 'vertex-ai-project')
+ self.append_log(f"🔑 Using project ID as API key: {api_key}")
+ except:
+ api_key = 'vertex-ai-project'
+ elif not api_key:
+ self.append_log("❌ Error: Please enter your API key.")
+ return False
+
+ # Determine output directory and save source EPUB path
+ if file_path.lower().endswith('.epub'):
+ base_name = os.path.splitext(os.path.basename(file_path))[0]
+ output_dir = base_name # This is how your code determines the output dir
+
+ # Save source EPUB path for EPUB converter
+ source_epub_file = os.path.join(output_dir, 'source_epub.txt')
+ try:
+ os.makedirs(output_dir, exist_ok=True) # Ensure output dir exists
+ with open(source_epub_file, 'w', encoding='utf-8') as f:
+ f.write(file_path)
+ self.append_log(f"📚 Saved source EPUB reference for chapter ordering")
+ except Exception as e:
+ self.append_log(f"⚠️ Could not save source EPUB reference: {e}")
+
+ # Set EPUB_PATH in environment for immediate use
+ os.environ['EPUB_PATH'] = file_path
+
+ old_argv = sys.argv
+ old_env = dict(os.environ)
+
+
+ try:
+ # Set up environment (same as original)
+ self.append_log(f"🔧 Setting up environment variables...")
+ self.append_log(f"📖 File: {os.path.basename(file_path)}")
+ self.append_log(f"🤖 Model: {self.model_var.get()}")
+
+ # Get the system prompt and log first 100 characters
+ system_prompt = self.prompt_text.get("1.0", "end").strip()
+ prompt_preview = system_prompt[:200] + "..." if len(system_prompt) > 100 else system_prompt
+ self.append_log(f"📝 System prompt preview: {prompt_preview}")
+ self.append_log(f"📏 System prompt length: {len(system_prompt)} characters")
+
+ # Check if glossary info is in the system prompt
+ if "glossary" in system_prompt.lower() or "character entry" in system_prompt.lower():
+ self.append_log(f"📚 ✅ Glossary appears to be included in system prompt")
+ else:
+ self.append_log(f"📚 ⚠️ No glossary detected in system prompt")
+
+ # Log glossary status
+ if hasattr(self, 'manual_glossary_path') and self.manual_glossary_path:
+ self.append_log(f"📑 Manual glossary loaded: {os.path.basename(self.manual_glossary_path)}")
+ else:
+ self.append_log(f"📑 No manual glossary loaded")
+
+ # IMPORTANT: Set IS_TEXT_FILE_TRANSLATION flag for text files
+ if file_path.lower().endswith('.txt'):
+ os.environ['IS_TEXT_FILE_TRANSLATION'] = '1'
+ self.append_log("📄 Processing as text file")
+
+ # Set environment variables
+ env_vars = self._get_environment_variables(file_path, api_key)
+
+ # Enable async chapter extraction for EPUBs to prevent GUI freezing
+ if file_path.lower().endswith('.epub'):
+ env_vars['USE_ASYNC_CHAPTER_EXTRACTION'] = '1'
+ self.append_log("🚀 Using async chapter extraction (subprocess mode)")
+
+ os.environ.update(env_vars)
+
+ # Handle chapter range
+ chap_range = self.chapter_range_entry.get().strip()
+ if chap_range:
+ os.environ['CHAPTER_RANGE'] = chap_range
+ self.append_log(f"📊 Chapter Range: {chap_range}")
+
+ # Set other environment variables (token limits, etc.)
+ if hasattr(self, 'token_limit_disabled') and self.token_limit_disabled:
+ os.environ['MAX_INPUT_TOKENS'] = ''
+ else:
+ token_val = self.token_limit_entry.get().strip()
+ if token_val and token_val.isdigit():
+ os.environ['MAX_INPUT_TOKENS'] = token_val
+ else:
+ os.environ['MAX_INPUT_TOKENS'] = '1000000'
+
+ # Validate glossary path
+ if hasattr(self, 'manual_glossary_path') and self.manual_glossary_path:
+ if (hasattr(self, 'auto_loaded_glossary_path') and
+ self.manual_glossary_path == self.auto_loaded_glossary_path):
+ if (hasattr(self, 'auto_loaded_glossary_for_file') and
+ hasattr(self, 'file_path') and
+ self.file_path == self.auto_loaded_glossary_for_file):
+ os.environ['MANUAL_GLOSSARY'] = self.manual_glossary_path
+ self.append_log(f"📑 Using auto-loaded glossary: {os.path.basename(self.manual_glossary_path)}")
+ else:
+ os.environ['MANUAL_GLOSSARY'] = self.manual_glossary_path
+ self.append_log(f"📑 Using manual glossary: {os.path.basename(self.manual_glossary_path)}")
+
+ # Set sys.argv to match what TransateKRtoEN.py expects
+ sys.argv = ['TransateKRtoEN.py', file_path]
+
+ self.append_log("🚀 Starting translation...")
+
+ # Ensure Payloads directory exists
+ os.makedirs("Payloads", exist_ok=True)
+
+ # Run translation
+ translation_main(
+ log_callback=self.append_log,
+ stop_callback=lambda: self.stop_requested
+ )
+
+ if not self.stop_requested:
+ self.append_log("✅ Translation completed successfully!")
+ return True
+ else:
+ return False
+
+ except Exception as e:
+ self.append_log(f"❌ Translation error: {e}")
+ if hasattr(self, 'append_log_with_api_error_detection'):
+ self.append_log_with_api_error_detection(str(e))
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ return False
+
+ finally:
+ sys.argv = old_argv
+ os.environ.clear()
+ os.environ.update(old_env)
+
+ except Exception as e:
+ self.append_log(f"❌ Error in text file processing: {str(e)}")
+ return False
+
+ def _get_environment_variables(self, epub_path, api_key):
+ """Get all environment variables for translation/glossary"""
+
+ # Get Google Cloud project ID if using Vertex AI
+ google_cloud_project = ''
+ model = self.model_var.get()
+ if '@' in model or model.startswith('vertex/'):
+ google_creds = self.config.get('google_cloud_credentials')
+ if google_creds and os.path.exists(google_creds):
+ try:
+ with open(google_creds, 'r') as f:
+ creds_data = json.load(f)
+ google_cloud_project = creds_data.get('project_id', '')
+ except:
+ pass
+
+ # Handle extraction mode - check which variables exist
+ if hasattr(self, 'text_extraction_method_var'):
+ # New cleaner UI variables
+ extraction_method = self.text_extraction_method_var.get()
+ filtering_level = self.file_filtering_level_var.get()
+
+ if extraction_method == 'enhanced':
+ extraction_mode = 'enhanced'
+ enhanced_filtering = filtering_level
+ else:
+ extraction_mode = filtering_level
+ enhanced_filtering = 'smart' # default
+ else:
+ # Old UI variables
+ extraction_mode = self.extraction_mode_var.get()
+ if extraction_mode == 'enhanced':
+ enhanced_filtering = getattr(self, 'enhanced_filtering_var', tk.StringVar(value='smart')).get()
+ else:
+
+
+ enhanced_filtering = 'smart'
+
+ # Ensure multi-key env toggles are set early for the main translation path as well
+ try:
+ if self.config.get('use_multi_api_keys', False):
+ os.environ['USE_MULTI_KEYS'] = '1'
+ else:
+ os.environ['USE_MULTI_KEYS'] = '0'
+ if self.config.get('use_fallback_keys', False):
+ os.environ['USE_FALLBACK_KEYS'] = '1'
+ else:
+ os.environ['USE_FALLBACK_KEYS'] = '0'
+ except Exception:
+ pass
+
+ return {
+ 'EPUB_PATH': epub_path,
+ 'MODEL': self.model_var.get(),
+ 'CONTEXTUAL': '1' if self.contextual_var.get() else '0',
+ 'SEND_INTERVAL_SECONDS': str(self.delay_entry.get()),
+ 'THREAD_SUBMISSION_DELAY_SECONDS': self.thread_delay_var.get().strip() or '0.5',
+ 'MAX_OUTPUT_TOKENS': str(self.max_output_tokens),
+ 'API_KEY': api_key,
+ 'OPENAI_API_KEY': api_key,
+ 'OPENAI_OR_Gemini_API_KEY': api_key,
+ 'GEMINI_API_KEY': api_key,
+ 'SYSTEM_PROMPT': self.prompt_text.get("1.0", "end").strip(),
+ 'TRANSLATE_BOOK_TITLE': "1" if self.translate_book_title_var.get() else "0",
+ 'BOOK_TITLE_PROMPT': self.book_title_prompt,
+ 'BOOK_TITLE_SYSTEM_PROMPT': self.config.get('book_title_system_prompt',
+ "You are a translator. Respond with only the translated text, nothing else. Do not add any explanation or additional content."),
+ 'REMOVE_AI_ARTIFACTS': "1" if self.REMOVE_AI_ARTIFACTS_var.get() else "0",
+ 'USE_ROLLING_SUMMARY': "1" if (hasattr(self, 'rolling_summary_var') and self.rolling_summary_var.get()) else ("1" if self.config.get('use_rolling_summary') else "0"),
+ 'SUMMARY_ROLE': self.config.get('summary_role', 'user'),
+ 'ROLLING_SUMMARY_EXCHANGES': self.rolling_summary_exchanges_var.get(),
+ 'ROLLING_SUMMARY_MODE': self.rolling_summary_mode_var.get(),
+ 'ROLLING_SUMMARY_SYSTEM_PROMPT': self.rolling_summary_system_prompt,
+ 'ROLLING_SUMMARY_USER_PROMPT': self.rolling_summary_user_prompt,
+ 'ROLLING_SUMMARY_MAX_ENTRIES': self.rolling_summary_max_entries_var.get(),
+ 'PROFILE_NAME': self.lang_var.get().lower(),
+ 'TRANSLATION_TEMPERATURE': str(self.trans_temp.get()),
+ 'TRANSLATION_HISTORY_LIMIT': str(self.trans_history.get()),
+ 'EPUB_OUTPUT_DIR': os.getcwd(),
+ 'APPEND_GLOSSARY': "1" if self.append_glossary_var.get() else "0",
+ 'APPEND_GLOSSARY_PROMPT': self.append_glossary_prompt,
+ 'EMERGENCY_PARAGRAPH_RESTORE': "1" if self.emergency_restore_var.get() else "0",
+ 'REINFORCEMENT_FREQUENCY': self.reinforcement_freq_var.get(),
+ 'RETRY_TRUNCATED': "1" if self.retry_truncated_var.get() else "0",
+ 'MAX_RETRY_TOKENS': self.max_retry_tokens_var.get(),
+ 'RETRY_DUPLICATE_BODIES': "1" if self.retry_duplicate_var.get() else "0",
+ 'DUPLICATE_LOOKBACK_CHAPTERS': self.duplicate_lookback_var.get(),
+ 'GLOSSARY_MIN_FREQUENCY': self.glossary_min_frequency_var.get(),
+ 'GLOSSARY_MAX_NAMES': self.glossary_max_names_var.get(),
+ 'GLOSSARY_MAX_TITLES': self.glossary_max_titles_var.get(),
+ 'GLOSSARY_BATCH_SIZE': self.glossary_batch_size_var.get(),
+ 'GLOSSARY_STRIP_HONORIFICS': "1" if self.strip_honorifics_var.get() else "0",
+ 'GLOSSARY_CHAPTER_SPLIT_THRESHOLD': self.glossary_chapter_split_threshold_var.get(),
+ 'GLOSSARY_FILTER_MODE': self.glossary_filter_mode_var.get(),
+ 'ENABLE_AUTO_GLOSSARY': "1" if self.enable_auto_glossary_var.get() else "0",
+ 'AUTO_GLOSSARY_PROMPT': self.auto_glossary_prompt if hasattr(self, 'auto_glossary_prompt') else '',
+ 'APPEND_GLOSSARY_PROMPT': self.append_glossary_prompt if hasattr(self, 'append_glossary_prompt') else '',
+ 'GLOSSARY_TRANSLATION_PROMPT': self.glossary_translation_prompt if hasattr(self, 'glossary_translation_prompt') else '',
+ 'GLOSSARY_FORMAT_INSTRUCTIONS': self.glossary_format_instructions if hasattr(self, 'glossary_format_instructions') else '',
+ 'GLOSSARY_USE_LEGACY_CSV': '1' if self.use_legacy_csv_var.get() else '0',
+ 'ENABLE_IMAGE_TRANSLATION': "1" if self.enable_image_translation_var.get() else "0",
+ 'PROCESS_WEBNOVEL_IMAGES': "1" if self.process_webnovel_images_var.get() else "0",
+ 'WEBNOVEL_MIN_HEIGHT': self.webnovel_min_height_var.get(),
+ 'MAX_IMAGES_PER_CHAPTER': self.max_images_per_chapter_var.get(),
+ 'IMAGE_API_DELAY': '1.0',
+ 'SAVE_IMAGE_TRANSLATIONS': '1',
+ 'IMAGE_CHUNK_HEIGHT': self.image_chunk_height_var.get(),
+ 'HIDE_IMAGE_TRANSLATION_LABEL': "1" if self.hide_image_translation_label_var.get() else "0",
+ 'RETRY_TIMEOUT': "1" if self.retry_timeout_var.get() else "0",
+ 'CHUNK_TIMEOUT': self.chunk_timeout_var.get(),
+ # New network/HTTP controls
+ 'CONNECT_TIMEOUT': str(self.config.get('connect_timeout', os.environ.get('CONNECT_TIMEOUT', '10'))),
+ 'READ_TIMEOUT': str(self.config.get('read_timeout', os.environ.get('READ_TIMEOUT', os.environ.get('CHUNK_TIMEOUT', '180')))),
+ 'HTTP_POOL_CONNECTIONS': str(self.config.get('http_pool_connections', os.environ.get('HTTP_POOL_CONNECTIONS', '20'))),
+ 'HTTP_POOL_MAXSIZE': str(self.config.get('http_pool_maxsize', os.environ.get('HTTP_POOL_MAXSIZE', '50'))),
+ 'IGNORE_RETRY_AFTER': '1' if (hasattr(self, 'ignore_retry_after_var') and self.ignore_retry_after_var.get()) else '0',
+ 'MAX_RETRIES': str(self.config.get('max_retries', os.environ.get('MAX_RETRIES', '7'))),
+ 'BATCH_TRANSLATION': "1" if self.batch_translation_var.get() else "0",
+ 'BATCH_SIZE': self.batch_size_var.get(),
+ 'CONSERVATIVE_BATCHING': "1" if self.conservative_batching_var.get() else "0",
+ 'DISABLE_ZERO_DETECTION': "1" if self.disable_zero_detection_var.get() else "0",
+ 'TRANSLATION_HISTORY_ROLLING': "1" if self.translation_history_rolling_var.get() else "0",
+ 'USE_GEMINI_OPENAI_ENDPOINT': '1' if self.use_gemini_openai_endpoint_var.get() else '0',
+ 'GEMINI_OPENAI_ENDPOINT': self.gemini_openai_endpoint_var.get() if self.gemini_openai_endpoint_var.get() else '',
+ "ATTACH_CSS_TO_CHAPTERS": "1" if self.attach_css_to_chapters_var.get() else "0",
+ 'GLOSSARY_FUZZY_THRESHOLD': str(self.config.get('glossary_fuzzy_threshold', 0.90)),
+ 'GLOSSARY_MAX_TEXT_SIZE': self.glossary_max_text_size_var.get(),
+ 'GLOSSARY_MAX_SENTENCES': self.glossary_max_sentences_var.get(),
+ 'USE_FALLBACK_KEYS': '1' if self.config.get('use_fallback_keys', False) else '0',
+ 'FALLBACK_KEYS': json.dumps(self.config.get('fallback_keys', [])),
+
+ # Extraction settings
+ "EXTRACTION_MODE": extraction_mode,
+ "ENHANCED_FILTERING": enhanced_filtering,
+ "ENHANCED_PRESERVE_STRUCTURE": "1" if getattr(self, 'enhanced_preserve_structure_var', tk.BooleanVar(value=True)).get() else "0",
+ 'FORCE_BS_FOR_TRADITIONAL': '1' if getattr(self, 'force_bs_for_traditional_var', tk.BooleanVar(value=False)).get() else '0',
+
+ # For new UI
+ "TEXT_EXTRACTION_METHOD": extraction_method if hasattr(self, 'text_extraction_method_var') else ('enhanced' if extraction_mode == 'enhanced' else 'standard'),
+ "FILE_FILTERING_LEVEL": filtering_level if hasattr(self, 'file_filtering_level_var') else extraction_mode,
+ 'DISABLE_CHAPTER_MERGING': '1' if self.disable_chapter_merging_var.get() else '0',
+ 'DISABLE_EPUB_GALLERY': "1" if self.disable_epub_gallery_var.get() else "0",
+ 'DISABLE_AUTOMATIC_COVER_CREATION': "1" if getattr(self, 'disable_automatic_cover_creation_var', tk.BooleanVar(value=False)).get() else "0",
+ 'TRANSLATE_COVER_HTML': "1" if getattr(self, 'translate_cover_html_var', tk.BooleanVar(value=False)).get() else "0",
+ 'DUPLICATE_DETECTION_MODE': self.duplicate_detection_mode_var.get(),
+ 'CHAPTER_NUMBER_OFFSET': str(self.chapter_number_offset_var.get()),
+ 'USE_HEADER_AS_OUTPUT': "1" if self.use_header_as_output_var.get() else "0",
+ 'ENABLE_DECIMAL_CHAPTERS': "1" if self.enable_decimal_chapters_var.get() else "0",
+ 'ENABLE_WATERMARK_REMOVAL': "1" if self.enable_watermark_removal_var.get() else "0",
+ 'ADVANCED_WATERMARK_REMOVAL': "1" if self.advanced_watermark_removal_var.get() else "0",
+ 'SAVE_CLEANED_IMAGES': "1" if self.save_cleaned_images_var.get() else "0",
+ 'COMPRESSION_FACTOR': self.compression_factor_var.get(),
+ 'DISABLE_GEMINI_SAFETY': str(self.config.get('disable_gemini_safety', False)).lower(),
+ 'GLOSSARY_DUPLICATE_KEY_MODE': self.config.get('glossary_duplicate_key_mode', 'auto'),
+ 'GLOSSARY_DUPLICATE_CUSTOM_FIELD': self.config.get('glossary_duplicate_custom_field', ''),
+ 'MANUAL_GLOSSARY': self.manual_glossary_path if hasattr(self, 'manual_glossary_path') and self.manual_glossary_path else '',
+ 'FORCE_NCX_ONLY': '1' if self.force_ncx_only_var.get() else '0',
+ 'SINGLE_API_IMAGE_CHUNKS': "1" if self.single_api_image_chunks_var.get() else "0",
+ 'ENABLE_GEMINI_THINKING': "1" if self.enable_gemini_thinking_var.get() else "0",
+ 'THINKING_BUDGET': self.thinking_budget_var.get() if self.enable_gemini_thinking_var.get() else '0',
+ # GPT/OpenRouter reasoning
+ 'ENABLE_GPT_THINKING': "1" if self.enable_gpt_thinking_var.get() else "0",
+ 'GPT_REASONING_TOKENS': self.gpt_reasoning_tokens_var.get() if self.enable_gpt_thinking_var.get() else '',
+ 'GPT_EFFORT': self.gpt_effort_var.get(),
+ 'OPENROUTER_EXCLUDE': '1',
+ # Custom API endpoints
+ 'OPENAI_CUSTOM_BASE_URL': self.openai_base_url_var.get() if self.openai_base_url_var.get() else '',
+ 'GROQ_API_URL': self.groq_base_url_var.get() if self.groq_base_url_var.get() else '',
+ 'FIREWORKS_API_URL': self.fireworks_base_url_var.get() if hasattr(self, 'fireworks_base_url_var') and self.fireworks_base_url_var.get() else '',
+ 'USE_CUSTOM_OPENAI_ENDPOINT': '1' if self.use_custom_openai_endpoint_var.get() else '0',
+
+ # Image compression settings
+ 'ENABLE_IMAGE_COMPRESSION': "1" if self.config.get('enable_image_compression', False) else "0",
+ 'AUTO_COMPRESS_ENABLED': "1" if self.config.get('auto_compress_enabled', True) else "0",
+ 'TARGET_IMAGE_TOKENS': str(self.config.get('target_image_tokens', 1000)),
+ 'IMAGE_COMPRESSION_FORMAT': self.config.get('image_compression_format', 'auto'),
+ 'WEBP_QUALITY': str(self.config.get('webp_quality', 85)),
+ 'JPEG_QUALITY': str(self.config.get('jpeg_quality', 85)),
+ 'PNG_COMPRESSION': str(self.config.get('png_compression', 6)),
+ 'MAX_IMAGE_DIMENSION': str(self.config.get('max_image_dimension', 2048)),
+ 'MAX_IMAGE_SIZE_MB': str(self.config.get('max_image_size_mb', 10)),
+ 'PRESERVE_TRANSPARENCY': "1" if self.config.get('preserve_transparency', False) else "0",
+ 'PRESERVE_ORIGINAL_FORMAT': "1" if self.config.get('preserve_original_format', False) else "0",
+ 'OPTIMIZE_FOR_OCR': "1" if self.config.get('optimize_for_ocr', True) else "0",
+ 'PROGRESSIVE_ENCODING': "1" if self.config.get('progressive_encoding', True) else "0",
+ 'SAVE_COMPRESSED_IMAGES': "1" if self.config.get('save_compressed_images', False) else "0",
+ 'IMAGE_CHUNK_OVERLAP_PERCENT': self.image_chunk_overlap_var.get(),
+
+
+ # Metadata and batch header translation settings
+ 'TRANSLATE_METADATA_FIELDS': json.dumps(self.translate_metadata_fields),
+ 'METADATA_TRANSLATION_MODE': self.config.get('metadata_translation_mode', 'together'),
+ 'BATCH_TRANSLATE_HEADERS': "1" if self.batch_translate_headers_var.get() else "0",
+ 'HEADERS_PER_BATCH': self.headers_per_batch_var.get(),
+ 'UPDATE_HTML_HEADERS': "1" if self.update_html_headers_var.get() else "0",
+ 'SAVE_HEADER_TRANSLATIONS': "1" if self.save_header_translations_var.get() else "0",
+ 'METADATA_FIELD_PROMPTS': json.dumps(self.config.get('metadata_field_prompts', {})),
+ 'LANG_PROMPT_BEHAVIOR': self.config.get('lang_prompt_behavior', 'auto'),
+ 'FORCED_SOURCE_LANG': self.config.get('forced_source_lang', 'Korean'),
+ 'OUTPUT_LANGUAGE': self.config.get('output_language', 'English'),
+ 'METADATA_BATCH_PROMPT': self.config.get('metadata_batch_prompt', ''),
+
+ # AI Hunter configuration
+ 'AI_HUNTER_CONFIG': json.dumps(self.config.get('ai_hunter_config', {})),
+
+ # Anti-duplicate parameters
+ 'ENABLE_ANTI_DUPLICATE': '1' if hasattr(self, 'enable_anti_duplicate_var') and self.enable_anti_duplicate_var.get() else '0',
+ 'TOP_P': str(self.top_p_var.get()) if hasattr(self, 'top_p_var') else '1.0',
+ 'TOP_K': str(self.top_k_var.get()) if hasattr(self, 'top_k_var') else '0',
+ 'FREQUENCY_PENALTY': str(self.frequency_penalty_var.get()) if hasattr(self, 'frequency_penalty_var') else '0.0',
+ 'PRESENCE_PENALTY': str(self.presence_penalty_var.get()) if hasattr(self, 'presence_penalty_var') else '0.0',
+ 'REPETITION_PENALTY': str(self.repetition_penalty_var.get()) if hasattr(self, 'repetition_penalty_var') else '1.0',
+ 'CANDIDATE_COUNT': str(self.candidate_count_var.get()) if hasattr(self, 'candidate_count_var') else '1',
+ 'CUSTOM_STOP_SEQUENCES': self.custom_stop_sequences_var.get() if hasattr(self, 'custom_stop_sequences_var') else '',
+ 'LOGIT_BIAS_ENABLED': '1' if hasattr(self, 'logit_bias_enabled_var') and self.logit_bias_enabled_var.get() else '0',
+ 'LOGIT_BIAS_STRENGTH': str(self.logit_bias_strength_var.get()) if hasattr(self, 'logit_bias_strength_var') else '-0.5',
+ 'BIAS_COMMON_WORDS': '1' if hasattr(self, 'bias_common_words_var') and self.bias_common_words_var.get() else '0',
+ 'BIAS_REPETITIVE_PHRASES': '1' if hasattr(self, 'bias_repetitive_phrases_var') and self.bias_repetitive_phrases_var.get() else '0',
+ 'GOOGLE_APPLICATION_CREDENTIALS': os.environ.get('GOOGLE_APPLICATION_CREDENTIALS', ''),
+ 'GOOGLE_CLOUD_PROJECT': google_cloud_project, # Now properly set from credentials
+ 'VERTEX_AI_LOCATION': self.vertex_location_var.get() if hasattr(self, 'vertex_location_var') else 'us-east5',
+ 'IS_AZURE_ENDPOINT': '1' if (self.use_custom_openai_endpoint_var.get() and
+ ('.azure.com' in self.openai_base_url_var.get() or
+ '.cognitiveservices' in self.openai_base_url_var.get())) else '0',
+ 'AZURE_API_VERSION': str(self.config.get('azure_api_version', '2024-08-01-preview')),
+
+ # Multi API Key support
+ 'USE_MULTI_API_KEYS': "1" if self.config.get('use_multi_api_keys', False) else "0",
+ 'MULTI_API_KEYS': json.dumps(self.config.get('multi_api_keys', [])) if self.config.get('use_multi_api_keys', False) else '[]',
+ 'FORCE_KEY_ROTATION': '1' if self.config.get('force_key_rotation', True) else '0',
+ 'ROTATION_FREQUENCY': str(self.config.get('rotation_frequency', 1)),
+
+ }
+ print(f"[DEBUG] DISABLE_CHAPTER_MERGING = '{os.getenv('DISABLE_CHAPTER_MERGING', '0')}'")
+
+ def run_glossary_extraction_thread(self):
+ """Start glossary extraction in a background worker (ThreadPoolExecutor)"""
+ if ((hasattr(self, 'translation_thread') and self.translation_thread and self.translation_thread.is_alive()) or
+ (hasattr(self, 'translation_future') and self.translation_future and not self.translation_future.done())):
+ self.append_log("⚠️ Cannot run glossary extraction while translation is in progress.")
+ messagebox.showwarning("Process Running", "Please wait for translation to complete before extracting glossary.")
+ return
+
+ if self.glossary_thread and self.glossary_thread.is_alive():
+ self.stop_glossary_extraction()
+ return
+
+ # Check if files are selected
+ if not hasattr(self, 'selected_files') or not self.selected_files:
+ # Try to get file from entry field (backward compatibility)
+ file_path = self.entry_epub.get().strip()
+ if not file_path or file_path.startswith("No file selected") or "files selected" in file_path:
+ messagebox.showerror("Error", "Please select file(s) to extract glossary from.")
+ return
+ self.selected_files = [file_path]
+
+ # Reset stop flags
+ self.stop_requested = False
+ if glossary_stop_flag:
+ glossary_stop_flag(False)
+
+ # IMPORTANT: Also reset the module's internal stop flag
+ try:
+ import extract_glossary_from_epub
+ extract_glossary_from_epub.set_stop_flag(False)
+ except:
+ pass
+
+ # Use shared executor
+ self._ensure_executor()
+ if self.executor:
+ self.glossary_future = self.executor.submit(self.run_glossary_extraction_direct)
+ else:
+ thread_name = f"GlossaryThread_{int(time.time())}"
+ self.glossary_thread = threading.Thread(target=self.run_glossary_extraction_direct, name=thread_name, daemon=True)
+ self.glossary_thread.start()
+ self.master.after(100, self.update_run_button)
+
+ def run_glossary_extraction_direct(self):
+ """Run glossary extraction directly - handles multiple files and different file types"""
+ try:
+ self.append_log("🔄 Loading glossary modules...")
+ if not self._lazy_load_modules():
+ self.append_log("❌ Failed to load glossary modules")
+ return
+
+ if glossary_main is None:
+ self.append_log("❌ Glossary extraction module is not available")
+ return
+
+ # Create Glossary folder
+ os.makedirs("Glossary", exist_ok=True)
+
+ # ========== NEW: APPLY OPF-BASED SORTING ==========
+ # Sort files based on OPF order if available
+ original_file_count = len(self.selected_files)
+ self.selected_files = self._get_opf_file_order(self.selected_files)
+ self.append_log(f"📚 Processing {original_file_count} files in reading order for glossary extraction")
+ # ====================================================
+
+ # Group files by type and folder
+ image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp'}
+
+ # Separate images and text files
+ image_files = []
+ text_files = []
+
+ for file_path in self.selected_files:
+ ext = os.path.splitext(file_path)[1].lower()
+ if ext in image_extensions:
+ image_files.append(file_path)
+ elif ext in {'.epub', '.txt'}:
+ text_files.append(file_path)
+ else:
+ self.append_log(f"⚠️ Skipping unsupported file type: {ext}")
+
+ # Group images by folder
+ image_groups = {}
+ for img_path in image_files:
+ folder = os.path.dirname(img_path)
+ if folder not in image_groups:
+ image_groups[folder] = []
+ image_groups[folder].append(img_path)
+
+ total_groups = len(image_groups) + len(text_files)
+ current_group = 0
+ successful = 0
+ failed = 0
+
+ # Process image groups (each folder gets one combined glossary)
+ for folder, images in image_groups.items():
+ if self.stop_requested:
+ break
+
+ current_group += 1
+ folder_name = os.path.basename(folder) if folder else "images"
+
+ self.append_log(f"\n{'='*60}")
+ self.append_log(f"📁 Processing image folder ({current_group}/{total_groups}): {folder_name}")
+ self.append_log(f" Found {len(images)} images")
+ self.append_log(f"{'='*60}")
+
+ # Process all images in this folder and extract glossary
+ if self._process_image_folder_for_glossary(folder_name, images):
+ successful += 1
+ else:
+ failed += 1
+
+ # Process text files individually
+ for text_file in text_files:
+ if self.stop_requested:
+ break
+
+ current_group += 1
+
+ self.append_log(f"\n{'='*60}")
+ self.append_log(f"📄 Processing file ({current_group}/{total_groups}): {os.path.basename(text_file)}")
+ self.append_log(f"{'='*60}")
+
+ if self._extract_glossary_from_text_file(text_file):
+ successful += 1
+ else:
+ failed += 1
+
+ # Final summary
+ self.append_log(f"\n{'='*60}")
+ self.append_log(f"📊 Glossary Extraction Summary:")
+ self.append_log(f" ✅ Successful: {successful} glossaries")
+ if failed > 0:
+ self.append_log(f" ❌ Failed: {failed} glossaries")
+ self.append_log(f" 📁 Total: {total_groups} glossaries")
+ self.append_log(f" 📂 All glossaries saved in: Glossary/")
+ self.append_log(f"{'='*60}")
+
+ except Exception as e:
+ self.append_log(f"❌ Glossary extraction setup error: {e}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+
+ finally:
+ self.stop_requested = False
+ if glossary_stop_flag:
+ glossary_stop_flag(False)
+
+ # IMPORTANT: Also reset the module's internal stop flag
+ try:
+ import extract_glossary_from_epub
+ extract_glossary_from_epub.set_stop_flag(False)
+ except:
+ pass
+
+ self.glossary_thread = None
+ self.current_file_index = 0
+ self.master.after(0, self.update_run_button)
+
+ def _process_image_folder_for_glossary(self, folder_name, image_files):
+ """Process all images from a folder and create a combined glossary with new format"""
+ try:
+ import hashlib
+ from unified_api_client import UnifiedClient, UnifiedClientError
+
+ # Initialize folder-specific progress manager for images
+ self.glossary_progress_manager = self._init_image_glossary_progress_manager(folder_name)
+
+ all_glossary_entries = []
+ processed = 0
+ skipped = 0
+
+ # Get API key and model
+ api_key = self.api_key_entry.get().strip()
+ model = self.model_var.get().strip()
+
+ if not api_key or not model:
+ self.append_log("❌ Error: API key and model required")
+ return False
+
+ if not self.manual_glossary_prompt:
+ self.append_log("❌ Error: No glossary prompt configured")
+ return False
+
+ # Initialize API client
+ try:
+ client = UnifiedClient(model=model, api_key=api_key)
+ except Exception as e:
+ self.append_log(f"❌ Failed to initialize API client: {str(e)}")
+ return False
+
+ # Get temperature and other settings from glossary config
+ temperature = float(self.config.get('manual_glossary_temperature', 0.1))
+ max_tokens = int(self.max_output_tokens_var.get()) if hasattr(self, 'max_output_tokens_var') else 8192
+ api_delay = float(self.delay_entry.get()) if hasattr(self, 'delay_entry') else 2.0
+
+ self.append_log(f"🔧 Glossary extraction settings:")
+ self.append_log(f" Temperature: {temperature}")
+ self.append_log(f" Max tokens: {max_tokens}")
+ self.append_log(f" API delay: {api_delay}s")
+ format_parts = ["type", "raw_name", "translated_name", "gender"]
+ custom_fields_json = self.config.get('manual_custom_fields', '[]')
+ try:
+ custom_fields = json.loads(custom_fields_json) if isinstance(custom_fields_json, str) else custom_fields_json
+ if custom_fields:
+ format_parts.extend(custom_fields)
+ except:
+ custom_fields = []
+ self.append_log(f" Format: Simple ({', '.join(format_parts)})")
+
+ # Check honorifics filter toggle
+ honorifics_disabled = self.config.get('glossary_disable_honorifics_filter', False)
+ if honorifics_disabled:
+ self.append_log(f" Honorifics Filter: ❌ DISABLED")
+ else:
+ self.append_log(f" Honorifics Filter: ✅ ENABLED")
+
+ # Track timing for ETA calculation
+ start_time = time.time()
+ total_entries_extracted = 0
+
+ # Set up thread-safe payload directory
+ thread_name = threading.current_thread().name
+ thread_id = threading.current_thread().ident
+ thread_dir = os.path.join("Payloads", "glossary", f"{thread_name}_{thread_id}")
+ os.makedirs(thread_dir, exist_ok=True)
+
+ # Process each image
+ for i, image_path in enumerate(image_files):
+ if self.stop_requested:
+ self.append_log("⏹️ Glossary extraction stopped by user")
+ break
+
+ image_name = os.path.basename(image_path)
+ self.append_log(f"\n 🖼️ Processing image {i+1}/{len(image_files)}: {image_name}")
+
+ # Check progress tracking for this image
+ try:
+ content_hash = self.glossary_progress_manager.get_content_hash(image_path)
+ except Exception as e:
+ content_hash = hashlib.sha256(image_path.encode()).hexdigest()
+
+ # Check if already processed
+ needs_extraction, skip_reason, _ = self.glossary_progress_manager.check_image_status(image_path, content_hash)
+
+ if not needs_extraction:
+ self.append_log(f" ⏭️ {skip_reason}")
+ # Try to load previous results if available
+ existing_data = self.glossary_progress_manager.get_cached_result(content_hash)
+ if existing_data:
+ all_glossary_entries.extend(existing_data)
+ continue
+
+ # Skip cover images
+ if 'cover' in image_name.lower():
+ self.append_log(f" ⏭️ Skipping cover image")
+ self.glossary_progress_manager.update(image_path, content_hash, status="skipped_cover")
+ skipped += 1
+ continue
+
+ # Update progress to in-progress
+ self.glossary_progress_manager.update(image_path, content_hash, status="in_progress")
+
+ try:
+ # Read image
+ with open(image_path, 'rb') as img_file:
+ image_data = img_file.read()
+
+ import base64
+ image_base64 = base64.b64encode(image_data).decode('utf-8')
+ size_mb = len(image_data) / (1024 * 1024)
+ base_name = os.path.splitext(image_name)[0]
+ self.append_log(f" 📊 Image size: {size_mb:.2f} MB")
+
+ # Build prompt for new format
+ custom_fields_json = self.config.get('manual_custom_fields', '[]')
+ try:
+ custom_fields = json.loads(custom_fields_json) if isinstance(custom_fields_json, str) else custom_fields_json
+ except:
+ custom_fields = []
+
+ # Build honorifics instruction based on toggle
+ honorifics_instruction = ""
+ if not honorifics_disabled:
+ honorifics_instruction = "- Do NOT include honorifics (님, 씨, さん, 様, etc.) in raw_name\n"
+
+ if self.manual_glossary_prompt:
+ prompt = self.manual_glossary_prompt
+
+ # Build fields description
+ fields_str = """- type: "character" for people/beings or "term" for locations/objects/concepts
+- raw_name: name in the original language/script
+- translated_name: English/romanized translation
+- gender: (for characters only) Male/Female/Unknown"""
+
+ if custom_fields:
+ for field in custom_fields:
+ fields_str += f"\n- {field}: custom field"
+
+ # Replace placeholders
+ prompt = prompt.replace('{fields}', fields_str)
+ prompt = prompt.replace('{chapter_text}', '')
+ prompt = prompt.replace('{{fields}}', fields_str)
+ prompt = prompt.replace('{{chapter_text}}', '')
+ prompt = prompt.replace('{text}', '')
+ prompt = prompt.replace('{{text}}', '')
+ else:
+ # Default prompt
+ fields_str = """For each entity, provide JSON with these fields:
+- type: "character" for people/beings or "term" for locations/objects/concepts
+- raw_name: name in the original language/script
+- translated_name: English/romanized translation
+- gender: (for characters only) Male/Female/Unknown"""
+
+ if custom_fields:
+ fields_str += "\nAdditional custom fields:"
+ for field in custom_fields:
+ fields_str += f"\n- {field}"
+
+ prompt = f"""Extract all characters and important terms from this image.
+
+{fields_str}
+
+Important rules:
+{honorifics_instruction}- Romanize names appropriately
+- Output ONLY a JSON array"""
+
+ messages = [{"role": "user", "content": prompt}]
+
+ # Save request payload in thread-safe location
+ timestamp = time.strftime("%Y%m%d_%H%M%S")
+ payload_file = os.path.join(thread_dir, f"image_{timestamp}_{base_name}_request.json")
+
+ request_payload = {
+ "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
+ "model": model,
+ "image_file": image_name,
+ "image_size_mb": size_mb,
+ "temperature": temperature,
+ "max_tokens": max_tokens,
+ "messages": messages,
+ "processed_prompt": prompt,
+ "honorifics_filter_enabled": not honorifics_disabled
+ }
+
+ with open(payload_file, 'w', encoding='utf-8') as f:
+ json.dump(request_payload, f, ensure_ascii=False, indent=2)
+
+ self.append_log(f" 📝 Saved request: {os.path.basename(payload_file)}")
+ self.append_log(f" 🌐 Extracting glossary from image...")
+
+ # API call with interrupt support
+ response = self._call_api_with_interrupt(
+ client, messages, image_base64, temperature, max_tokens
+ )
+
+ # Check if stopped after API call
+ if self.stop_requested:
+ self.append_log("⏹️ Glossary extraction stopped after API call")
+ self.glossary_progress_manager.update(image_path, content_hash, status="cancelled")
+ return False
+
+ # Get response content
+ glossary_json = None
+ if isinstance(response, (list, tuple)) and len(response) >= 2:
+ glossary_json = response[0]
+ elif hasattr(response, 'content'):
+ glossary_json = response.content
+ elif isinstance(response, str):
+ glossary_json = response
+ else:
+ glossary_json = str(response)
+
+ if glossary_json and glossary_json.strip():
+ # Save response in thread-safe location
+ response_file = os.path.join(thread_dir, f"image_{timestamp}_{base_name}_response.json")
+ response_payload = {
+ "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
+ "response_content": glossary_json,
+ "content_length": len(glossary_json)
+ }
+ with open(response_file, 'w', encoding='utf-8') as f:
+ json.dump(response_payload, f, ensure_ascii=False, indent=2)
+
+ self.append_log(f" 📝 Saved response: {os.path.basename(response_file)}")
+
+ # Parse the JSON response
+ try:
+ # Clean up the response
+ glossary_json = glossary_json.strip()
+ if glossary_json.startswith('```'):
+ glossary_json = glossary_json.split('```')[1]
+ if glossary_json.startswith('json'):
+ glossary_json = glossary_json[4:]
+ glossary_json = glossary_json.strip()
+ if glossary_json.endswith('```'):
+ glossary_json = glossary_json[:-3].strip()
+
+ # Parse JSON
+ glossary_data = json.loads(glossary_json)
+
+ # Process entries
+ entries_for_this_image = []
+ if isinstance(glossary_data, list):
+ for entry in glossary_data:
+ # Validate entry format
+ if isinstance(entry, dict) and 'type' in entry and 'raw_name' in entry:
+ # Clean raw_name
+ entry['raw_name'] = entry['raw_name'].strip()
+
+ # Ensure required fields
+ if 'translated_name' not in entry:
+ entry['translated_name'] = entry.get('name', entry['raw_name'])
+
+ # Add gender for characters if missing
+ if entry['type'] == 'character' and 'gender' not in entry:
+ entry['gender'] = 'Unknown'
+
+ entries_for_this_image.append(entry)
+ all_glossary_entries.append(entry)
+
+ # Show progress
+ elapsed = time.time() - start_time
+ valid_count = len(entries_for_this_image)
+
+ for j, entry in enumerate(entries_for_this_image):
+ total_entries_extracted += 1
+
+ # Calculate ETA
+ if total_entries_extracted == 1:
+ eta = 0.0
+ else:
+ avg_time = elapsed / total_entries_extracted
+ remaining_images = len(image_files) - (i + 1)
+ estimated_remaining_entries = remaining_images * 3
+ eta = avg_time * estimated_remaining_entries
+
+ # Get entry name
+ entry_name = f"{entry['raw_name']} ({entry['translated_name']})"
+
+ # Print progress
+ progress_msg = f'[Image {i+1}/{len(image_files)}] [{j+1}/{valid_count}] ({elapsed:.1f}s elapsed, ETA {eta:.1f}s) → {entry["type"]}: {entry_name}'
+ print(progress_msg)
+ self.append_log(progress_msg)
+
+ self.append_log(f" ✅ Extracted {valid_count} entries")
+
+ # Update progress with extracted data
+ self.glossary_progress_manager.update(
+ image_path,
+ content_hash,
+ status="completed",
+ extracted_data=entries_for_this_image
+ )
+
+ processed += 1
+
+ # Save intermediate progress with skip logic
+ if all_glossary_entries:
+ self._save_intermediate_glossary_with_skip(folder_name, all_glossary_entries)
+
+ except json.JSONDecodeError as e:
+ self.append_log(f" ❌ Failed to parse JSON: {e}")
+ self.append_log(f" Response preview: {glossary_json[:200]}...")
+ self.glossary_progress_manager.update(image_path, content_hash, status="error", error=str(e))
+ skipped += 1
+ else:
+ self.append_log(f" ⚠️ No glossary data in response")
+ self.glossary_progress_manager.update(image_path, content_hash, status="error", error="No data")
+ skipped += 1
+
+ # Add delay between API calls
+ if i < len(image_files) - 1 and not self.stop_requested:
+ self.append_log(f" ⏱️ Waiting {api_delay}s before next image...")
+ elapsed = 0
+ while elapsed < api_delay and not self.stop_requested:
+ time.sleep(0.1)
+ elapsed += 0.1
+
+ except Exception as e:
+ self.append_log(f" ❌ Failed to process: {str(e)}")
+ self.glossary_progress_manager.update(image_path, content_hash, status="error", error=str(e))
+ skipped += 1
+
+ if not all_glossary_entries:
+ self.append_log(f"❌ No glossary entries extracted from any images")
+ return False
+
+ self.append_log(f"\n📝 Extracted {len(all_glossary_entries)} total entries from {processed} images")
+
+ # Save the final glossary with skip logic
+ output_file = os.path.join("Glossary", f"{folder_name}_glossary.json")
+
+ try:
+ # Apply skip logic for duplicates
+ self.append_log(f"📊 Applying skip logic for duplicate raw names...")
+
+ # Import or define the skip function
+ try:
+ from extract_glossary_from_epub import skip_duplicate_entries, remove_honorifics
+ # Set environment variable for honorifics toggle
+ import os
+ os.environ['GLOSSARY_DISABLE_HONORIFICS_FILTER'] = '1' if honorifics_disabled else '0'
+ final_entries = skip_duplicate_entries(all_glossary_entries)
+ except:
+ # Fallback implementation
+ def remove_honorifics_local(name):
+ if not name or honorifics_disabled:
+ return name.strip()
+
+ # Modern honorifics
+ korean_honorifics = ['님', '씨', '군', '양', '선생님', '사장님', '과장님', '대리님', '주임님', '이사님']
+ japanese_honorifics = ['さん', 'さま', '様', 'くん', '君', 'ちゃん', 'せんせい', '先生']
+ chinese_honorifics = ['先生', '女士', '小姐', '老师', '师傅', '大人']
+
+ # Archaic honorifics
+ korean_archaic = ['공', '옹', '어른', '나리', '나으리', '대감', '영감', '마님', '마마']
+ japanese_archaic = ['どの', '殿', 'みこと', '命', '尊', 'ひめ', '姫']
+ chinese_archaic = ['公', '侯', '伯', '子', '男', '王', '君', '卿', '大夫']
+
+ all_honorifics = (korean_honorifics + japanese_honorifics + chinese_honorifics +
+ korean_archaic + japanese_archaic + chinese_archaic)
+
+ name_cleaned = name.strip()
+ sorted_honorifics = sorted(all_honorifics, key=len, reverse=True)
+
+ for honorific in sorted_honorifics:
+ if name_cleaned.endswith(honorific):
+ name_cleaned = name_cleaned[:-len(honorific)].strip()
+ break
+
+ return name_cleaned
+
+ seen_raw_names = set()
+ final_entries = []
+ skipped = 0
+
+ for entry in all_glossary_entries:
+ raw_name = entry.get('raw_name', '')
+ if not raw_name:
+ continue
+
+ cleaned_name = remove_honorifics_local(raw_name)
+
+ if cleaned_name.lower() in seen_raw_names:
+ skipped += 1
+ self.append_log(f" ⏭️ Skipping duplicate: {raw_name}")
+ continue
+
+ seen_raw_names.add(cleaned_name.lower())
+ final_entries.append(entry)
+
+ self.append_log(f"✅ Kept {len(final_entries)} unique entries (skipped {skipped} duplicates)")
+
+ # Save final glossary
+ os.makedirs("Glossary", exist_ok=True)
+
+ self.append_log(f"💾 Writing glossary to: {output_file}")
+ with open(output_file, 'w', encoding='utf-8') as f:
+ json.dump(final_entries, f, ensure_ascii=False, indent=2)
+
+ # Also save as CSV for compatibility
+ csv_file = output_file.replace('.json', '.csv')
+ with open(csv_file, 'w', encoding='utf-8', newline='') as f:
+ import csv
+ writer = csv.writer(f)
+ # Write header
+ header = ['type', 'raw_name', 'translated_name', 'gender']
+ if custom_fields:
+ header.extend(custom_fields)
+ writer.writerow(header)
+
+ for entry in final_entries:
+ row = [
+ entry.get('type', ''),
+ entry.get('raw_name', ''),
+ entry.get('translated_name', ''),
+ entry.get('gender', '') if entry.get('type') == 'character' else ''
+ ]
+ # Add custom field values
+ if custom_fields:
+ for field in custom_fields:
+ row.append(entry.get(field, ''))
+ writer.writerow(row)
+
+ self.append_log(f"💾 Also saved as CSV: {os.path.basename(csv_file)}")
+
+ # Verify files were created
+ if os.path.exists(output_file):
+ file_size = os.path.getsize(output_file)
+ self.append_log(f"✅ Glossary saved successfully ({file_size} bytes)")
+
+ # Show sample of what was saved
+ if final_entries:
+ self.append_log(f"\n📋 Sample entries:")
+ for entry in final_entries[:5]:
+ self.append_log(f" - [{entry['type']}] {entry['raw_name']} → {entry['translated_name']}")
+ else:
+ self.append_log(f"❌ File was not created!")
+ return False
+
+ return True
+
+ except Exception as e:
+ self.append_log(f"❌ Failed to save glossary: {e}")
+ import traceback
+ self.append_log(f"Full error: {traceback.format_exc()}")
+ return False
+
+ except Exception as e:
+ self.append_log(f"❌ Error processing image folder: {str(e)}")
+ import traceback
+ self.append_log(f"❌ Full error: {traceback.format_exc()}")
+ return False
+
+ def _init_image_glossary_progress_manager(self, folder_name):
+ """Initialize a folder-specific progress manager for image glossary extraction"""
+ import hashlib
+
+ class ImageGlossaryProgressManager:
+ def __init__(self, folder_name):
+ self.PROGRESS_FILE = os.path.join("Glossary", f"{folder_name}_glossary_progress.json")
+ self.prog = self._init_or_load()
+
+ def _init_or_load(self):
+ """Initialize or load progress tracking"""
+ if os.path.exists(self.PROGRESS_FILE):
+ try:
+ with open(self.PROGRESS_FILE, "r", encoding="utf-8") as pf:
+ return json.load(pf)
+ except Exception as e:
+ return {"images": {}, "content_hashes": {}, "extracted_data": {}, "version": "1.0"}
+ else:
+ return {"images": {}, "content_hashes": {}, "extracted_data": {}, "version": "1.0"}
+
+ def save(self):
+ """Save progress to file atomically"""
+ try:
+ os.makedirs(os.path.dirname(self.PROGRESS_FILE), exist_ok=True)
+ temp_file = self.PROGRESS_FILE + '.tmp'
+ with open(temp_file, "w", encoding="utf-8") as pf:
+ json.dump(self.prog, pf, ensure_ascii=False, indent=2)
+
+ if os.path.exists(self.PROGRESS_FILE):
+ os.remove(self.PROGRESS_FILE)
+ os.rename(temp_file, self.PROGRESS_FILE)
+ except Exception as e:
+ pass
+
+ def get_content_hash(self, file_path):
+ """Generate content hash for a file"""
+ hasher = hashlib.sha256()
+ with open(file_path, 'rb') as f:
+ for chunk in iter(lambda: f.read(4096), b""):
+ hasher.update(chunk)
+ return hasher.hexdigest()
+
+ def check_image_status(self, image_path, content_hash):
+ """Check if an image needs glossary extraction"""
+ image_name = os.path.basename(image_path)
+
+ # Check for skip markers
+ skip_key = f"skip_{image_name}"
+ if skip_key in self.prog:
+ skip_info = self.prog[skip_key]
+ if skip_info.get('status') == 'skipped':
+ return False, f"Image marked as skipped", None
+
+ # Check if image has already been processed
+ if content_hash in self.prog["images"]:
+ image_info = self.prog["images"][content_hash]
+ status = image_info.get("status")
+
+ if status == "completed":
+ return False, f"Already processed", None
+ elif status == "skipped_cover":
+ return False, "Cover image - skipped", None
+ elif status == "error":
+ # Previous error, retry
+ return True, None, None
+
+ return True, None, None
+
+ def get_cached_result(self, content_hash):
+ """Get cached extraction result for a content hash"""
+ if content_hash in self.prog.get("extracted_data", {}):
+ return self.prog["extracted_data"][content_hash]
+ return None
+
+ def update(self, image_path, content_hash, status="in_progress", error=None, extracted_data=None):
+ """Update progress for an image"""
+ image_name = os.path.basename(image_path)
+
+ image_info = {
+ "name": image_name,
+ "path": image_path,
+ "content_hash": content_hash,
+ "status": status,
+ "last_updated": time.time()
+ }
+
+ if error:
+ image_info["error"] = str(error)
+
+ self.prog["images"][content_hash] = image_info
+
+ # Store extracted data separately for reuse
+ if extracted_data and status == "completed":
+ if "extracted_data" not in self.prog:
+ self.prog["extracted_data"] = {}
+ self.prog["extracted_data"][content_hash] = extracted_data
+
+ self.save()
+
+ # Create and return the progress manager
+ progress_manager = ImageGlossaryProgressManager(folder_name)
+ self.append_log(f"📊 Progress tracking in: Glossary/{folder_name}_glossary_progress.json")
+ return progress_manager
+
+ def _save_intermediate_glossary_with_skip(self, folder_name, entries):
+ """Save intermediate glossary results with skip logic"""
+ try:
+ output_file = os.path.join("Glossary", f"{folder_name}_glossary.json")
+
+ # Apply skip logic
+ try:
+ from extract_glossary_from_epub import skip_duplicate_entries
+ unique_entries = skip_duplicate_entries(entries)
+ except:
+ # Fallback
+ seen = set()
+ unique_entries = []
+ for entry in entries:
+ key = entry.get('raw_name', '').lower().strip()
+ if key and key not in seen:
+ seen.add(key)
+ unique_entries.append(entry)
+
+ # Write the file
+ with open(output_file, 'w', encoding='utf-8') as f:
+ json.dump(unique_entries, f, ensure_ascii=False, indent=2)
+
+ except Exception as e:
+ self.append_log(f" ⚠️ Could not save intermediate glossary: {e}")
+
+ def _call_api_with_interrupt(self, client, messages, image_base64, temperature, max_tokens):
+ """Make API call with interrupt support and thread safety"""
+ import threading
+ import queue
+ from unified_api_client import UnifiedClientError
+
+ result_queue = queue.Queue()
+
+ def api_call():
+ try:
+ result = client.send_image(messages, image_base64, temperature=temperature, max_tokens=max_tokens)
+ result_queue.put(('success', result))
+ except Exception as e:
+ result_queue.put(('error', e))
+
+ api_thread = threading.Thread(target=api_call)
+ api_thread.daemon = True
+ api_thread.start()
+
+ # Check for stop every 0.5 seconds
+ while api_thread.is_alive():
+ if self.stop_requested:
+ # Cancel the operation
+ if hasattr(client, 'cancel_current_operation'):
+ client.cancel_current_operation()
+ raise UnifiedClientError("Glossary extraction stopped by user")
+
+ try:
+ status, result = result_queue.get(timeout=0.5)
+ if status == 'error':
+ raise result
+ return result
+ except queue.Empty:
+ continue
+
+ # Thread finished, get final result
+ try:
+ status, result = result_queue.get(timeout=1.0)
+ if status == 'error':
+ raise result
+ return result
+ except queue.Empty:
+ raise UnifiedClientError("API call completed but no result received")
+
+ def _extract_glossary_from_text_file(self, file_path):
+ """Extract glossary from EPUB or TXT file using existing glossary extraction"""
+ # Skip glossary extraction for traditional APIs
+ try:
+ api_key = self.api_key_entry.get()
+ model = self.model_var.get()
+ if is_traditional_translation_api(model):
+ self.append_log("ℹ️ Skipping automatic glossary extraction (not supported by Google Translate / DeepL translation APIs)")
+ return {}
+
+ # Validate Vertex AI credentials if needed
+ elif '@' in model or model.startswith('vertex/'):
+ google_creds = self.config.get('google_cloud_credentials')
+ if not google_creds or not os.path.exists(google_creds):
+ self.append_log("❌ Error: Google Cloud credentials required for Vertex AI models.")
+ return False
+
+ os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = google_creds
+ self.append_log(f"🔑 Using Google Cloud credentials: {os.path.basename(google_creds)}")
+
+ if not api_key:
+ try:
+ with open(google_creds, 'r') as f:
+ creds_data = json.load(f)
+ api_key = creds_data.get('project_id', 'vertex-ai-project')
+ self.append_log(f"🔑 Using project ID as API key: {api_key}")
+ except:
+ api_key = 'vertex-ai-project'
+ elif not api_key:
+ self.append_log("❌ Error: Please enter your API key.")
+ return False
+
+ old_argv = sys.argv
+ old_env = dict(os.environ)
+
+ # Output file - do NOT prepend Glossary/ because extract_glossary_from_epub.py handles that
+ epub_base = os.path.splitext(os.path.basename(file_path))[0]
+ output_path = f"{epub_base}_glossary.json"
+
+ try:
+ # Set up environment variables
+ env_updates = {
+ 'GLOSSARY_TEMPERATURE': str(self.config.get('manual_glossary_temperature', 0.1)),
+ 'GLOSSARY_CONTEXT_LIMIT': str(self.config.get('manual_context_limit', 2)),
+ 'MODEL': self.model_var.get(),
+ 'OPENAI_API_KEY': api_key,
+ 'OPENAI_OR_Gemini_API_KEY': api_key,
+ 'API_KEY': api_key,
+ 'MAX_OUTPUT_TOKENS': str(self.max_output_tokens),
+ 'BATCH_TRANSLATION': "1" if self.batch_translation_var.get() else "0",
+ 'BATCH_SIZE': str(self.batch_size_var.get()),
+ 'GLOSSARY_SYSTEM_PROMPT': self.manual_glossary_prompt,
+ 'CHAPTER_RANGE': self.chapter_range_entry.get().strip(),
+ 'GLOSSARY_DISABLE_HONORIFICS_FILTER': '1' if self.config.get('glossary_disable_honorifics_filter', False) else '0',
+ 'GLOSSARY_HISTORY_ROLLING': "1" if self.glossary_history_rolling_var.get() else "0",
+ 'DISABLE_GEMINI_SAFETY': str(self.config.get('disable_gemini_safety', False)).lower(),
+ 'OPENROUTER_USE_HTTP_ONLY': '1' if self.openrouter_http_only_var.get() else '0',
+ 'GLOSSARY_DUPLICATE_KEY_MODE': 'skip', # Always use skip mode for new format
+ 'SEND_INTERVAL_SECONDS': str(self.delay_entry.get()),
+ 'THREAD_SUBMISSION_DELAY_SECONDS': self.thread_delay_var.get().strip() or '0.5',
+ 'CONTEXTUAL': '1' if self.contextual_var.get() else '0',
+ 'GOOGLE_APPLICATION_CREDENTIALS': os.environ.get('GOOGLE_APPLICATION_CREDENTIALS', ''),
+
+ # NEW GLOSSARY ADDITIONS
+ 'GLOSSARY_MIN_FREQUENCY': str(self.glossary_min_frequency_var.get()),
+ 'GLOSSARY_MAX_NAMES': str(self.glossary_max_names_var.get()),
+ 'GLOSSARY_MAX_TITLES': str(self.glossary_max_titles_var.get()),
+ 'GLOSSARY_BATCH_SIZE': str(self.glossary_batch_size_var.get()),
+ 'ENABLE_AUTO_GLOSSARY': "1" if self.enable_auto_glossary_var.get() else "0",
+ 'APPEND_GLOSSARY': "1" if self.append_glossary_var.get() else "0",
+ 'GLOSSARY_STRIP_HONORIFICS': '1' if hasattr(self, 'strip_honorifics_var') and self.strip_honorifics_var.get() else '1',
+ 'AUTO_GLOSSARY_PROMPT': getattr(self, 'auto_glossary_prompt', ''),
+ 'APPEND_GLOSSARY_PROMPT': getattr(self, 'append_glossary_prompt', '- Follow this reference glossary for consistent translation (Do not output any raw entries):\n'),
+ 'GLOSSARY_TRANSLATION_PROMPT': getattr(self, 'glossary_translation_prompt', ''),
+ 'GLOSSARY_CUSTOM_ENTRY_TYPES': json.dumps(getattr(self, 'custom_entry_types', {})),
+ 'GLOSSARY_CUSTOM_FIELDS': json.dumps(getattr(self, 'custom_glossary_fields', [])),
+ 'GLOSSARY_FUZZY_THRESHOLD': str(self.config.get('glossary_fuzzy_threshold', 0.90)),
+ 'MANUAL_GLOSSARY': self.manual_glossary_path if hasattr(self, 'manual_glossary_path') and self.manual_glossary_path else '',
+ 'GLOSSARY_FORMAT_INSTRUCTIONS': self.glossary_format_instructions if hasattr(self, 'glossary_format_instructions') else '',
+
+
+ }
+
+ # Add project ID for Vertex AI
+ if '@' in model or model.startswith('vertex/'):
+ google_creds = self.config.get('google_cloud_credentials')
+ if google_creds and os.path.exists(google_creds):
+ try:
+ with open(google_creds, 'r') as f:
+ creds_data = json.load(f)
+ env_updates['GOOGLE_CLOUD_PROJECT'] = creds_data.get('project_id', '')
+ env_updates['VERTEX_AI_LOCATION'] = 'us-central1'
+ except:
+ pass
+
+ if self.custom_glossary_fields:
+ env_updates['GLOSSARY_CUSTOM_FIELDS'] = json.dumps(self.custom_glossary_fields)
+
+ # Propagate multi-key toggles so retry logic can engage
+ # Both must be enabled for main-then-fallback retry
+ try:
+ if self.config.get('use_multi_api_keys', False):
+ os.environ['USE_MULTI_KEYS'] = '1'
+ else:
+ os.environ['USE_MULTI_KEYS'] = '0'
+ if self.config.get('use_fallback_keys', False):
+ os.environ['USE_FALLBACK_KEYS'] = '1'
+ else:
+ os.environ['USE_FALLBACK_KEYS'] = '0'
+ except Exception:
+ # Keep going even if we can't set env for some reason
+ pass
+
+ os.environ.update(env_updates)
+
+ chap_range = self.chapter_range_entry.get().strip()
+ if chap_range:
+ self.append_log(f"📊 Chapter Range: {chap_range} (glossary extraction will only process these chapters)")
+
+ if self.token_limit_disabled:
+ os.environ['MAX_INPUT_TOKENS'] = ''
+ self.append_log("🎯 Input Token Limit: Unlimited (disabled)")
+ else:
+ token_val = self.token_limit_entry.get().strip()
+ if token_val and token_val.isdigit():
+ os.environ['MAX_INPUT_TOKENS'] = token_val
+ self.append_log(f"🎯 Input Token Limit: {token_val}")
+ else:
+ os.environ['MAX_INPUT_TOKENS'] = '50000'
+ self.append_log(f"🎯 Input Token Limit: 50000 (default)")
+
+ sys.argv = [
+ 'extract_glossary_from_epub.py',
+ '--epub', file_path,
+ '--output', output_path,
+ '--config', CONFIG_FILE
+ ]
+
+ self.append_log(f"🚀 Extracting glossary from: {os.path.basename(file_path)}")
+ self.append_log(f"📤 Output Token Limit: {self.max_output_tokens}")
+ format_parts = ["type", "raw_name", "translated_name", "gender"]
+ custom_fields_json = self.config.get('manual_custom_fields', '[]')
+ try:
+ custom_fields = json.loads(custom_fields_json) if isinstance(custom_fields_json, str) else custom_fields_json
+ if custom_fields:
+ format_parts.extend(custom_fields)
+ except:
+ custom_fields = []
+ self.append_log(f" Format: Simple ({', '.join(format_parts)})")
+
+ # Check honorifics filter
+ if self.config.get('glossary_disable_honorifics_filter', False):
+ self.append_log(f"📑 Honorifics Filter: ❌ DISABLED")
+ else:
+ self.append_log(f"📑 Honorifics Filter: ✅ ENABLED")
+
+ os.environ['MAX_OUTPUT_TOKENS'] = str(self.max_output_tokens)
+
+ # Enhanced stop callback that checks both flags
+ def enhanced_stop_callback():
+ # Check GUI stop flag
+ if self.stop_requested:
+ return True
+
+ # Also check if the glossary extraction module has its own stop flag
+ try:
+ import extract_glossary_from_epub
+ if hasattr(extract_glossary_from_epub, 'is_stop_requested') and extract_glossary_from_epub.is_stop_requested():
+ return True
+ except:
+ pass
+
+ return False
+
+ try:
+ # Import traceback for better error info
+ import traceback
+
+ # Run glossary extraction with enhanced stop callback
+ glossary_main(
+ log_callback=self.append_log,
+ stop_callback=enhanced_stop_callback
+ )
+ except Exception as e:
+ # Get the full traceback
+ tb_lines = traceback.format_exc()
+ self.append_log(f"❌ FULL ERROR TRACEBACK:\n{tb_lines}")
+ self.append_log(f"❌ Error extracting glossary from {os.path.basename(file_path)}: {e}")
+ return False
+
+ # Check if stopped
+ if self.stop_requested:
+ self.append_log("⏹️ Glossary extraction was stopped")
+ return False
+
+ # Check if output file exists
+ if not self.stop_requested and os.path.exists(output_path):
+ self.append_log(f"✅ Glossary saved to: {output_path}")
+ return True
+ else:
+ # Check if it was saved in Glossary folder by the script
+ glossary_path = os.path.join("Glossary", output_path)
+ if os.path.exists(glossary_path):
+ self.append_log(f"✅ Glossary saved to: {glossary_path}")
+ return True
+ return False
+
+ finally:
+ sys.argv = old_argv
+ os.environ.clear()
+ os.environ.update(old_env)
+
+ except Exception as e:
+ self.append_log(f"❌ Error extracting glossary from {os.path.basename(file_path)}: {e}")
+ return False
+
+ def epub_converter(self):
+ """Start EPUB converter in a separate thread"""
+ if not self._lazy_load_modules():
+ self.append_log("❌ Failed to load EPUB converter modules")
+ return
+
+ if fallback_compile_epub is None:
+ self.append_log("❌ EPUB converter module is not available")
+ messagebox.showerror("Module Error", "EPUB converter module is not available.")
+ return
+
+ if hasattr(self, 'translation_thread') and self.translation_thread and self.translation_thread.is_alive():
+ self.append_log("⚠️ Cannot run EPUB converter while translation is in progress.")
+ messagebox.showwarning("Process Running", "Please wait for translation to complete before converting EPUB.")
+ return
+
+ if hasattr(self, 'glossary_thread') and self.glossary_thread and self.glossary_thread.is_alive():
+ self.append_log("⚠️ Cannot run EPUB converter while glossary extraction is in progress.")
+ messagebox.showwarning("Process Running", "Please wait for glossary extraction to complete before converting EPUB.")
+ return
+
+ if hasattr(self, 'epub_thread') and self.epub_thread and self.epub_thread.is_alive():
+ self.stop_epub_converter()
+ return
+
+ folder = filedialog.askdirectory(title="Select translation output folder")
+ if not folder:
+ return
+
+ self.epub_folder = folder
+ self.stop_requested = False
+ # Run via shared executor
+ self._ensure_executor()
+ if self.executor:
+ self.epub_future = self.executor.submit(self.run_epub_converter_direct)
+ # Ensure button state is refreshed when the future completes
+ def _epub_done_callback(f):
+ try:
+ self.master.after(0, lambda: (setattr(self, 'epub_future', None), self.update_run_button()))
+ except Exception:
+ pass
+ try:
+ self.epub_future.add_done_callback(_epub_done_callback)
+ except Exception:
+ pass
+ else:
+ self.epub_thread = threading.Thread(target=self.run_epub_converter_direct, daemon=True)
+ self.epub_thread.start()
+ self.master.after(100, self.update_run_button)
+
+ def run_epub_converter_direct(self):
+ """Run EPUB converter directly without blocking GUI"""
+ try:
+ folder = self.epub_folder
+ self.append_log("📦 Starting EPUB Converter...")
+
+ # Set environment variables for EPUB converter
+ os.environ['DISABLE_EPUB_GALLERY'] = "1" if self.disable_epub_gallery_var.get() else "0"
+ os.environ['DISABLE_AUTOMATIC_COVER_CREATION'] = "1" if getattr(self, 'disable_automatic_cover_creation_var', tk.BooleanVar(value=False)).get() else "0"
+ os.environ['TRANSLATE_COVER_HTML'] = "1" if getattr(self, 'translate_cover_html_var', tk.BooleanVar(value=False)).get() else "0"
+
+ source_epub_file = os.path.join(folder, 'source_epub.txt')
+ if os.path.exists(source_epub_file):
+ try:
+ with open(source_epub_file, 'r', encoding='utf-8') as f:
+ source_epub_path = f.read().strip()
+
+ if source_epub_path and os.path.exists(source_epub_path):
+ os.environ['EPUB_PATH'] = source_epub_path
+ self.append_log(f"✅ Using source EPUB for proper chapter ordering: {os.path.basename(source_epub_path)}")
+ else:
+ self.append_log(f"⚠️ Source EPUB file not found: {source_epub_path}")
+ except Exception as e:
+ self.append_log(f"⚠️ Could not read source EPUB reference: {e}")
+ else:
+ self.append_log("ℹ️ No source EPUB reference found - using filename-based ordering")
+
+ # Set API credentials and model
+ api_key = self.api_key_entry.get()
+ if api_key:
+ os.environ['API_KEY'] = api_key
+ os.environ['OPENAI_API_KEY'] = api_key
+ os.environ['OPENAI_OR_Gemini_API_KEY'] = api_key
+
+ model = self.model_var.get()
+ if model:
+ os.environ['MODEL'] = model
+
+ # Set translation parameters from GUI
+ os.environ['TRANSLATION_TEMPERATURE'] = str(self.trans_temp.get())
+ os.environ['MAX_OUTPUT_TOKENS'] = str(self.max_output_tokens)
+
+ # Set batch translation settings
+ os.environ['BATCH_TRANSLATE_HEADERS'] = "1" if self.batch_translate_headers_var.get() else "0"
+ os.environ['HEADERS_PER_BATCH'] = str(self.headers_per_batch_var.get())
+ os.environ['UPDATE_HTML_HEADERS'] = "1" if self.update_html_headers_var.get() else "0"
+ os.environ['SAVE_HEADER_TRANSLATIONS'] = "1" if self.save_header_translations_var.get() else "0"
+
+ # Set metadata translation settings
+ os.environ['TRANSLATE_METADATA_FIELDS'] = json.dumps(self.translate_metadata_fields)
+ os.environ['METADATA_TRANSLATION_MODE'] = self.config.get('metadata_translation_mode', 'together')
+ print(f"[DEBUG] METADATA_FIELD_PROMPTS from env: {os.getenv('METADATA_FIELD_PROMPTS', 'NOT SET')[:100]}...")
+
+ # Debug: Log what we're setting
+ self.append_log(f"[DEBUG] Setting TRANSLATE_METADATA_FIELDS: {self.translate_metadata_fields}")
+ self.append_log(f"[DEBUG] Enabled fields: {[k for k, v in self.translate_metadata_fields.items() if v]}")
+
+ # Set book title translation settings
+ os.environ['TRANSLATE_BOOK_TITLE'] = "1" if self.translate_book_title_var.get() else "0"
+ os.environ['BOOK_TITLE_PROMPT'] = self.book_title_prompt
+ os.environ['BOOK_TITLE_SYSTEM_PROMPT'] = self.config.get('book_title_system_prompt',
+ "You are a translator. Respond with only the translated text, nothing else.")
+
+ # Set prompts
+ os.environ['SYSTEM_PROMPT'] = self.prompt_text.get("1.0", "end").strip()
+
+ fallback_compile_epub(folder, log_callback=self.append_log)
+
+ if not self.stop_requested:
+ self.append_log("✅ EPUB Converter completed successfully!")
+
+ epub_files = [f for f in os.listdir(folder) if f.endswith('.epub')]
+ if epub_files:
+ epub_files.sort(key=lambda x: os.path.getmtime(os.path.join(folder, x)), reverse=True)
+ out_file = os.path.join(folder, epub_files[0])
+ self.master.after(0, lambda: messagebox.showinfo("EPUB Compilation Success", f"Created: {out_file}"))
+ else:
+ self.append_log("⚠️ EPUB file was not created. Check the logs for details.")
+
+ except Exception as e:
+ error_str = str(e)
+ self.append_log(f"❌ EPUB Converter error: {error_str}")
+
+ if "Document is empty" not in error_str:
+ self.master.after(0, lambda: messagebox.showerror("EPUB Converter Failed", f"Error: {error_str}"))
+ else:
+ self.append_log("📋 Check the log above for details about what went wrong.")
+
+ finally:
+ # Always reset the thread and update button state when done
+ self.epub_thread = None
+ # Clear any future handle so update_run_button won't consider it running
+ if hasattr(self, 'epub_future'):
+ try:
+ # Don't cancel; just drop the reference. Future is already done here.
+ self.epub_future = None
+ except Exception:
+ pass
+ self.stop_requested = False
+ # Schedule GUI update on main thread
+ self.master.after(0, self.update_run_button)
+
+
+ def run_qa_scan(self, mode_override=None, non_interactive=False, preselected_files=None):
+ """Run QA scan with mode selection and settings"""
+ # Create a small loading window with icon
+ loading_window = self.wm.create_simple_dialog(
+ self.master,
+ "Loading QA Scanner",
+ width=300,
+ height=120,
+ modal=True,
+ hide_initially=False
+ )
+
+ # Create content frame
+ content_frame = tk.Frame(loading_window, padx=20, pady=20)
+ content_frame.pack(fill=tk.BOTH, expand=True)
+
+ # Try to add icon image if available
+ status_label = None
+ try:
+ from PIL import Image, ImageTk
+ ico_path = os.path.join(self.base_dir, 'Halgakos.ico')
+ if os.path.isfile(ico_path):
+ # Load icon at small size
+ icon_image = Image.open(ico_path)
+ icon_image = icon_image.resize((32, 32), Image.Resampling.LANCZOS)
+ icon_photo = ImageTk.PhotoImage(icon_image)
+
+ # Create horizontal layout
+ icon_label = tk.Label(content_frame, image=icon_photo)
+ icon_label.image = icon_photo # Keep reference
+ icon_label.pack(side=tk.LEFT, padx=(0, 10))
+
+ # Text on the right
+ text_frame = tk.Frame(content_frame)
+ text_frame.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+ tk.Label(text_frame, text="Initializing QA Scanner...",
+ font=('TkDefaultFont', 11)).pack(anchor=tk.W)
+ status_label = tk.Label(text_frame, text="Loading modules...",
+ font=('TkDefaultFont', 9), fg='gray')
+ status_label.pack(anchor=tk.W, pady=(5, 0))
+ else:
+ # Fallback without icon
+ tk.Label(content_frame, text="Initializing QA Scanner...",
+ font=('TkDefaultFont', 11)).pack()
+ status_label = tk.Label(content_frame, text="Loading modules...",
+ font=('TkDefaultFont', 9), fg='gray')
+ status_label.pack(pady=(10, 0))
+ except ImportError:
+ # No PIL, simple text only
+ tk.Label(content_frame, text="Initializing QA Scanner...",
+ font=('TkDefaultFont', 11)).pack()
+ status_label = tk.Label(content_frame, text="Loading modules...",
+ font=('TkDefaultFont', 9), fg='gray')
+ status_label.pack(pady=(10, 0))
+
+
+ self.master.update_idletasks()
+
+ try:
+ # Update status
+ if status_label:
+ status_label.config(text="Loading translation modules...")
+ loading_window.update_idletasks()
+
+ if not self._lazy_load_modules():
+ loading_window.destroy()
+ self.append_log("❌ Failed to load QA scanner modules")
+ return
+
+ if status_label:
+ status_label.config(text="Preparing scanner...")
+ loading_window.update_idletasks()
+
+ if scan_html_folder is None:
+ loading_window.destroy()
+ self.append_log("❌ QA scanner module is not available")
+ messagebox.showerror("Module Error", "QA scanner module is not available.")
+ return
+
+ if hasattr(self, 'qa_thread') and self.qa_thread and self.qa_thread.is_alive():
+ loading_window.destroy()
+ self.stop_requested = True
+ self.append_log("⛔ QA scan stop requested.")
+ return
+
+ # Close loading window
+ loading_window.destroy()
+ self.append_log("✅ QA scanner initialized successfully")
+
+ except Exception as e:
+ loading_window.destroy()
+ self.append_log(f"❌ Error initializing QA scanner: {e}")
+ return
+
+ # Load QA scanner settings from config
+ qa_settings = self.config.get('qa_scanner_settings', {
+ 'foreign_char_threshold': 10,
+ 'excluded_characters': '',
+ 'check_encoding_issues': False,
+ 'check_repetition': True,
+'check_translation_artifacts': False,
+ 'min_file_length': 0,
+ 'report_format': 'detailed',
+ 'auto_save_report': True,
+ 'check_missing_html_tag': True,
+ 'check_invalid_nesting': False,
+ 'check_word_count_ratio': False,
+ 'check_multiple_headers': True,
+ 'warn_name_mismatch': True,
+ 'cache_enabled': True,
+ 'cache_auto_size': False,
+ 'cache_show_stats': False,
+ 'cache_normalize_text': 10000,
+ 'cache_similarity_ratio': 20000,
+ 'cache_content_hashes': 5000,
+ 'cache_semantic_fingerprint': 2000,
+ 'cache_structural_signature': 2000,
+ 'cache_translation_artifacts': 1000
+ })
+ # Debug: Print current settings
+ print(f"[DEBUG] QA Settings: {qa_settings}")
+ print(f"[DEBUG] Word count check enabled: {qa_settings.get('check_word_count_ratio', False)}")
+
+ # Optionally skip mode dialog if a mode override was provided (e.g., scanning phase)
+ selected_mode_value = mode_override if mode_override else None
+ if selected_mode_value is None:
+ # Show mode selection dialog with settings
+ mode_dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Select QA Scanner Mode",
+ width=1500, # Optimal width for 4 cards
+ height=650, # Compact height to ensure buttons are visible
+ hide_initially=True
+ )
+
+ if selected_mode_value is None:
+ # Set minimum size to prevent dialog from being too small
+ mode_dialog.minsize(1200, 600)
+
+ # Variables
+ # selected_mode_value already set above
+
+ # Main container with constrained expansion
+ main_container = tk.Frame(mode_dialog)
+ main_container.pack(fill=tk.BOTH, expand=True, padx=10, pady=10) # Add padding
+
+ # Content with padding
+ main_frame = tk.Frame(main_container, padx=30, pady=20) # Reduced padding
+ main_frame.pack(fill=tk.X) # Only fill horizontally, don't expand
+
+ # Title with subtitle
+ title_frame = tk.Frame(main_frame)
+ title_frame.pack(pady=(0, 15)) # Further reduced
+
+ tk.Label(title_frame, text="Select Detection Mode",
+ font=('Arial', 28, 'bold'), fg='#f0f0f0').pack() # Further reduced
+ tk.Label(title_frame, text="Choose how sensitive the duplicate detection should be",
+ font=('Arial', 16), fg='#d0d0d0').pack(pady=(3, 0)) # Further reduced
+
+ # Mode cards container - don't expand vertically to leave room for buttons
+ modes_container = tk.Frame(main_frame)
+ modes_container.pack(fill=tk.X, pady=(0, 10)) # Reduced bottom padding
+
+ mode_data = [
+ {
+ "value": "ai-hunter",
+ "emoji": "🤖",
+ "title": "AI HUNTER",
+ "subtitle": "30% threshold",
+ "features": [
+ "✓ Catches AI retranslations",
+ "✓ Different translation styles",
+ "⚠ MANY false positives",
+ "✓ Same chapter, different words",
+ "✓ Detects paraphrasing",
+ "✓ Ultimate duplicate finder"
+ ],
+ "bg_color": "#2a1a3e", # Dark purple
+ "hover_color": "#6a4c93", # Medium purple
+ "border_color": "#8b5cf6",
+ "accent_color": "#a78bfa",
+ "recommendation": "⚡ Best for finding ALL similar content"
+ },
+ {
+ "value": "aggressive",
+ "emoji": "🔥",
+ "title": "AGGRESSIVE",
+ "subtitle": "75% threshold",
+ "features": [
+ "✓ Catches most duplicates",
+ "✓ Good for similar chapters",
+ "⚠ Some false positives",
+ "✓ Finds edited duplicates",
+ "✓ Moderate detection",
+ "✓ Balanced approach"
+ ],
+ "bg_color": "#3a1f1f", # Dark red
+ "hover_color": "#8b3a3a", # Medium red
+ "border_color": "#dc2626",
+ "accent_color": "#ef4444",
+ "recommendation": None
+ },
+ {
+ "value": "quick-scan",
+ "emoji": "⚡",
+ "title": "QUICK SCAN",
+ "subtitle": "85% threshold, Speed optimized",
+ "features": [
+ "✓ 3-5x faster scanning",
+ "✓ Checks consecutive chapters only",
+ "✓ Simplified analysis",
+ "✓ Skips AI Hunter",
+ "✓ Good for large libraries",
+ "✓ Minimal resource usage"
+ ],
+ "bg_color": "#1f2937", # Dark gray
+ "hover_color": "#374151", # Medium gray
+ "border_color": "#059669",
+ "accent_color": "#10b981",
+ "recommendation": "✅ Recommended for quick checks & large folders"
+ },
+ {
+ "value": "custom",
+ "emoji": "⚙️",
+ "title": "CUSTOM",
+ "subtitle": "Configurable",
+ "features": [
+ "✓ Fully customizable",
+ "✓ Set your own thresholds",
+ "✓ Advanced controls",
+ "✓ Fine-tune detection",
+ "✓ Expert mode",
+ "✓ Maximum flexibility"
+ ],
+ "bg_color": "#1e3a5f", # Dark blue
+ "hover_color": "#2c5aa0", # Medium blue
+ "border_color": "#3b82f6",
+ "accent_color": "#60a5fa",
+ "recommendation": None
+ }
+ ]
+
+ # Restore original single-row layout (four cards across)
+ if selected_mode_value is None:
+ # Make each column share space evenly
+ for col in range(len(mode_data)):
+ modes_container.columnconfigure(col, weight=1)
+ # Keep row height stable
+ modes_container.rowconfigure(0, weight=0)
+
+ for idx, mi in enumerate(mode_data):
+ # Main card frame with initial background
+ card = tk.Frame(
+ modes_container,
+ bg=mi["bg_color"],
+ highlightbackground=mi["border_color"],
+ highlightthickness=2,
+ relief='flat'
+ )
+ card.grid(row=0, column=idx, padx=10, pady=5, sticky='nsew')
+
+ # Content frame
+ content_frame = tk.Frame(card, bg=mi["bg_color"], cursor='hand2')
+ content_frame.pack(fill=tk.BOTH, expand=True, padx=15, pady=15)
+
+ # Emoji
+ emoji_label = tk.Label(content_frame, text=mi["emoji"], font=('Arial', 48), bg=mi["bg_color"])
+ emoji_label.pack(pady=(0, 5))
+
+ # Title
+ title_label = tk.Label(content_frame, text=mi["title"], font=('Arial', 24, 'bold'), fg='white', bg=mi["bg_color"])
+ title_label.pack()
+
+ # Subtitle
+ tk.Label(content_frame, text=mi["subtitle"], font=('Arial', 14), fg=mi["accent_color"], bg=mi["bg_color"]).pack(pady=(3, 10))
+
+ # Features
+ features_frame = tk.Frame(content_frame, bg=mi["bg_color"])
+ features_frame.pack(fill=tk.X)
+ for feature in mi["features"]:
+ tk.Label(features_frame, text=feature, font=('Arial', 11), fg='#e0e0e0', bg=mi["bg_color"], justify=tk.LEFT).pack(anchor=tk.W, pady=1)
+
+ # Recommendation badge if present
+ rec_frame = None
+ rec_label = None
+ if mi["recommendation"]:
+ rec_frame = tk.Frame(content_frame, bg=mi["accent_color"])
+ rec_frame.pack(pady=(10, 0), fill=tk.X)
+ rec_label = tk.Label(rec_frame, text=mi["recommendation"], font=('Arial', 11, 'bold'), fg='white', bg=mi["accent_color"], padx=8, pady=4)
+ rec_label.pack()
+
+ # Click handler
+ def make_click_handler(mode_value):
+ def handler(event=None):
+ nonlocal selected_mode_value
+ selected_mode_value = mode_value
+ mode_dialog.destroy()
+ return handler
+ click_handler = make_click_handler(mi["value"])
+
+ # Hover effects for this card only
+ def create_hover_handlers(md, widgets):
+ def on_enter(event=None):
+ for w in widgets:
+ try:
+ w.config(bg=md["hover_color"])
+ except Exception:
+ pass
+ def on_leave(event=None):
+ for w in widgets:
+ try:
+ w.config(bg=md["bg_color"])
+ except Exception:
+ pass
+ return on_enter, on_leave
+
+ all_widgets = [content_frame, emoji_label, title_label, features_frame]
+ all_widgets += [child for child in features_frame.winfo_children() if isinstance(child, tk.Label)]
+ if rec_frame is not None:
+ all_widgets += [rec_frame, rec_label]
+ on_enter, on_leave = create_hover_handlers(mi, all_widgets)
+
+ for widget in [card, content_frame, emoji_label, title_label, features_frame] + list(features_frame.winfo_children()):
+ widget.bind("", on_enter)
+ widget.bind("", on_leave)
+ widget.bind("", click_handler)
+ try:
+ widget.config(cursor='hand2')
+ except Exception:
+ pass
+
+ if selected_mode_value is None:
+ # Add separator line before buttons
+ separator = tk.Frame(main_frame, height=1, bg='#cccccc') # Thinner separator
+ separator.pack(fill=tk.X, pady=(10, 0))
+
+ # Add settings button at the bottom
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(10, 5)) # Reduced padding
+
+ # Create inner frame for centering buttons
+ button_inner = tk.Frame(button_frame)
+ button_inner.pack()
+
+ def show_qa_settings():
+ """Show QA Scanner settings dialog"""
+ self.show_qa_scanner_settings(mode_dialog, qa_settings)
+
+ # Auto-search checkbox - moved to left side of Scanner Settings
+ if not hasattr(self, 'qa_auto_search_output_var'):
+ self.qa_auto_search_output_var = tk.BooleanVar(value=self.config.get('qa_auto_search_output', True))
+ tb.Checkbutton(
+ button_inner,
+ text="Auto-search output", # Renamed from "Auto-search output folder"
+ variable=self.qa_auto_search_output_var,
+ bootstyle="round-toggle"
+ ).pack(side=tk.LEFT, padx=10)
+
+ settings_btn = tb.Button(
+ button_inner,
+ text="⚙️ Scanner Settings", # Added extra space
+ command=show_qa_settings,
+ bootstyle="info-outline", # Changed to be more visible
+ width=18, # Slightly smaller
+ padding=(8, 10) # Reduced padding
+ )
+ settings_btn.pack(side=tk.LEFT, padx=10)
+
+ cancel_btn = tb.Button(
+ button_inner,
+ text="Cancel",
+ command=lambda: mode_dialog.destroy(),
+ bootstyle="danger", # Changed from outline to solid
+ width=12, # Smaller
+ padding=(8, 10) # Reduced padding
+ )
+ cancel_btn.pack(side=tk.LEFT, padx=10)
+
+ # Handle window close (X button)
+ def on_close():
+ nonlocal selected_mode_value
+ selected_mode_value = None
+ mode_dialog.destroy()
+
+ mode_dialog.protocol("WM_DELETE_WINDOW", on_close)
+
+ # Show dialog
+ mode_dialog.deiconify()
+ mode_dialog.update_idletasks() # Force geometry update
+ mode_dialog.wait_window()
+
+ # Check if user selected a mode
+ if selected_mode_value is None:
+ self.append_log("⚠️ QA scan canceled.")
+ return
+
+ # End of optional mode dialog
+
+ # Show custom settings dialog if custom mode is selected
+
+ # Show custom settings dialog if custom mode is selected
+ if selected_mode_value == "custom":
+ # Use WindowManager's setup_scrollable for proper scrolling support
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Custom Mode Settings",
+ width=800,
+ height=650,
+ max_width_ratio=0.9,
+ max_height_ratio=0.85
+ )
+
+ # Variables for custom settings
+ custom_settings = {
+ 'similarity': tk.IntVar(value=85),
+ 'semantic': tk.IntVar(value=80),
+ 'structural': tk.IntVar(value=90),
+ 'word_overlap': tk.IntVar(value=75),
+ 'minhash_threshold': tk.IntVar(value=80),
+ 'consecutive_chapters': tk.IntVar(value=2),
+ 'check_all_pairs': tk.BooleanVar(value=False),
+ 'sample_size': tk.IntVar(value=3000),
+ 'min_text_length': tk.IntVar(value=500)
+ }
+
+ # Title using consistent styling
+ title_label = tk.Label(scrollable_frame, text="Configure Custom Detection Settings",
+ font=('Arial', 20, 'bold'))
+ title_label.pack(pady=(0, 20))
+
+ # Detection Thresholds Section using ttkbootstrap
+ threshold_frame = tb.LabelFrame(scrollable_frame, text="Detection Thresholds (%)",
+ padding=25, bootstyle="secondary")
+ threshold_frame.pack(fill='x', padx=20, pady=(0, 25))
+
+ threshold_descriptions = {
+ 'similarity': ('Text Similarity', 'Character-by-character comparison'),
+ 'semantic': ('Semantic Analysis', 'Meaning and context matching'),
+ 'structural': ('Structural Patterns', 'Document structure similarity'),
+ 'word_overlap': ('Word Overlap', 'Common words between texts'),
+ 'minhash_threshold': ('MinHash Similarity', 'Fast approximate matching')
+ }
+
+ # Create percentage labels dictionary to store references
+ percentage_labels = {}
+
+ for setting_key, (label_text, description) in threshold_descriptions.items():
+ # Container for each threshold
+ row_frame = tk.Frame(threshold_frame)
+ row_frame.pack(fill='x', pady=8)
+
+ # Left side - labels
+ label_container = tk.Frame(row_frame)
+ label_container.pack(side='left', fill='x', expand=True)
+
+ main_label = tk.Label(label_container, text=f"{label_text} - {description}:",
+ font=('TkDefaultFont', 11))
+ main_label.pack(anchor='w')
+
+ # Right side - slider and percentage
+ slider_container = tk.Frame(row_frame)
+ slider_container.pack(side='right', padx=(20, 0))
+
+ # Percentage label (shows current value)
+ percentage_label = tk.Label(slider_container, text=f"{custom_settings[setting_key].get()}%",
+ font=('TkDefaultFont', 12, 'bold'), width=5, anchor='e')
+ percentage_label.pack(side='right', padx=(10, 0))
+ percentage_labels[setting_key] = percentage_label
+
+ # Create slider
+ slider = tb.Scale(slider_container,
+ from_=10, to=100,
+ variable=custom_settings[setting_key],
+ bootstyle="info",
+ length=300,
+ orient='horizontal')
+ slider.pack(side='right')
+
+ # Update percentage label when slider moves
+ def create_update_function(key, label):
+ def update_percentage(*args):
+ value = custom_settings[key].get()
+ label.config(text=f"{value}%")
+ return update_percentage
+
+ # Bind the update function
+ update_func = create_update_function(setting_key, percentage_label)
+ custom_settings[setting_key].trace('w', update_func)
+
+ # Processing Options Section
+ options_frame = tb.LabelFrame(scrollable_frame, text="Processing Options",
+ padding=20, bootstyle="secondary")
+ options_frame.pack(fill='x', padx=20, pady=15)
+
+ # Consecutive chapters option with spinbox
+ consec_frame = tk.Frame(options_frame)
+ consec_frame.pack(fill='x', pady=5)
+
+ tk.Label(consec_frame, text="Consecutive chapters to check:",
+ font=('TkDefaultFont', 11)).pack(side='left')
+
+ tb.Spinbox(consec_frame, from_=1, to=10,
+ textvariable=custom_settings['consecutive_chapters'],
+ width=10, bootstyle="info").pack(side='left', padx=(10, 0))
+
+ # Sample size option
+ sample_frame = tk.Frame(options_frame)
+ sample_frame.pack(fill='x', pady=5)
+
+ tk.Label(sample_frame, text="Sample size for comparison (characters):",
+ font=('TkDefaultFont', 11)).pack(side='left')
+
+ # Sample size spinbox with larger range
+ sample_spinbox = tb.Spinbox(sample_frame, from_=1000, to=10000, increment=500,
+ textvariable=custom_settings['sample_size'],
+ width=10, bootstyle="info")
+ sample_spinbox.pack(side='left', padx=(10, 0))
+
+ # Minimum text length option
+ min_length_frame = tk.Frame(options_frame)
+ min_length_frame.pack(fill='x', pady=5)
+
+ tk.Label(min_length_frame, text="Minimum text length to process (characters):",
+ font=('TkDefaultFont', 11)).pack(side='left')
+
+ # Minimum length spinbox
+ min_length_spinbox = tb.Spinbox(min_length_frame, from_=100, to=5000, increment=100,
+ textvariable=custom_settings['min_text_length'],
+ width=10, bootstyle="info")
+ min_length_spinbox.pack(side='left', padx=(10, 0))
+
+ # Check all file pairs option
+ tb.Checkbutton(options_frame, text="Check all file pairs (slower but more thorough)",
+ variable=custom_settings['check_all_pairs'],
+ bootstyle="primary").pack(anchor='w', pady=8)
+
+ # Create button frame at bottom (inside scrollable_frame)
+ button_frame = tk.Frame(scrollable_frame)
+ button_frame.pack(fill='x', pady=(30, 20))
+
+ # Center buttons using inner frame
+ button_inner = tk.Frame(button_frame)
+ button_inner.pack()
+
+ # Flag to track if settings were saved
+ settings_saved = False
+
+ def save_custom_settings():
+ """Save custom settings and close dialog"""
+ nonlocal settings_saved
+ qa_settings['custom_mode_settings'] = {
+ 'thresholds': {
+ 'similarity': custom_settings['similarity'].get() / 100,
+ 'semantic': custom_settings['semantic'].get() / 100,
+ 'structural': custom_settings['structural'].get() / 100,
+ 'word_overlap': custom_settings['word_overlap'].get() / 100,
+ 'minhash_threshold': custom_settings['minhash_threshold'].get() / 100
+ },
+ 'consecutive_chapters': custom_settings['consecutive_chapters'].get(),
+ 'check_all_pairs': custom_settings['check_all_pairs'].get(),
+ 'sample_size': custom_settings['sample_size'].get(),
+ 'min_text_length': custom_settings['min_text_length'].get()
+ }
+ settings_saved = True
+ self.append_log("✅ Custom detection settings saved")
+ dialog._cleanup_scrolling() # Clean up scrolling bindings
+ dialog.destroy()
+
+ def reset_to_defaults():
+ """Reset all values to default settings"""
+ if messagebox.askyesno("Reset to Defaults",
+ "Reset all values to default settings?",
+ parent=dialog):
+ custom_settings['similarity'].set(85)
+ custom_settings['semantic'].set(80)
+ custom_settings['structural'].set(90)
+ custom_settings['word_overlap'].set(75)
+ custom_settings['minhash_threshold'].set(80)
+ custom_settings['consecutive_chapters'].set(2)
+ custom_settings['check_all_pairs'].set(False)
+ custom_settings['sample_size'].set(3000)
+ custom_settings['min_text_length'].set(500)
+ self.append_log("ℹ️ Settings reset to defaults")
+
+ def cancel_settings():
+ """Cancel without saving"""
+ nonlocal settings_saved
+ if not settings_saved:
+ # Check if any settings were changed
+ defaults = {
+ 'similarity': 85,
+ 'semantic': 80,
+ 'structural': 90,
+ 'word_overlap': 75,
+ 'minhash_threshold': 80,
+ 'consecutive_chapters': 2,
+ 'check_all_pairs': False,
+ 'sample_size': 3000,
+ 'min_text_length': 500
+ }
+
+ changed = False
+ for key, default_val in defaults.items():
+ if custom_settings[key].get() != default_val:
+ changed = True
+ break
+
+ if changed:
+ if messagebox.askyesno("Unsaved Changes",
+ "You have unsaved changes. Are you sure you want to cancel?",
+ parent=dialog):
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+ else:
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+ else:
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+
+ # Use ttkbootstrap buttons with better styling
+ tb.Button(button_inner, text="Cancel",
+ command=cancel_settings,
+ bootstyle="secondary", width=15).pack(side='left', padx=5)
+
+ tb.Button(button_inner, text="Reset Defaults",
+ command=reset_to_defaults,
+ bootstyle="warning", width=15).pack(side='left', padx=5)
+
+ tb.Button(button_inner, text="Start Scan",
+ command=save_custom_settings,
+ bootstyle="success", width=15).pack(side='left', padx=5)
+
+ # Use WindowManager's auto-resize
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.9, max_height_ratio=0.72)
+
+ # Handle window close properly - treat as cancel
+ dialog.protocol("WM_DELETE_WINDOW", cancel_settings)
+
+ # Wait for dialog to close
+ dialog.wait_window()
+
+ # If user cancelled at this dialog, cancel the whole scan
+ if not settings_saved:
+ self.append_log("⚠️ QA scan canceled - no custom settings were saved.")
+ return
+ # Check if word count cross-reference is enabled but no EPUB is selected
+ check_word_count = qa_settings.get('check_word_count_ratio', False)
+ epub_files_to_scan = []
+ primary_epub_path = None
+
+ # ALWAYS populate epub_files_to_scan for auto-search, regardless of word count checking
+ # First check if current selection actually contains EPUBs
+ current_epub_files = []
+ if hasattr(self, 'selected_files') and self.selected_files:
+ current_epub_files = [f for f in self.selected_files if f.lower().endswith('.epub')]
+ print(f"[DEBUG] Current selection contains {len(current_epub_files)} EPUB files")
+
+ if current_epub_files:
+ # Use EPUBs from current selection
+ epub_files_to_scan = current_epub_files
+ print(f"[DEBUG] Using {len(epub_files_to_scan)} EPUB files from current selection")
+ else:
+ # No EPUBs in current selection - check if we have stored EPUBs
+ primary_epub_path = self.get_current_epub_path()
+ print(f"[DEBUG] get_current_epub_path returned: {primary_epub_path}")
+
+ if primary_epub_path:
+ epub_files_to_scan = [primary_epub_path]
+ print(f"[DEBUG] Using stored EPUB file for auto-search")
+
+ # Now handle word count specific logic if enabled
+ if check_word_count:
+ print("[DEBUG] Word count check is enabled, validating EPUB availability...")
+
+ # Check if we have EPUBs for word count analysis
+ if not epub_files_to_scan:
+ # No EPUBs available for word count analysis
+ result = messagebox.askyesnocancel(
+ "No Source EPUB Selected",
+ "Word count cross-reference is enabled but no source EPUB file is selected.\n\n" +
+ "Would you like to:\n" +
+ "• YES - Continue scan without word count analysis\n" +
+ "• NO - Select an EPUB file now\n" +
+ "• CANCEL - Cancel the scan",
+ icon='warning'
+ )
+
+ if result is None: # Cancel
+ self.append_log("⚠️ QA scan canceled.")
+ return
+ elif result is False: # No - Select EPUB now
+ epub_path = filedialog.askopenfilename(
+ title="Select Source EPUB File",
+ filetypes=[("EPUB files", "*.epub"), ("All files", "*.*")]
+ )
+
+ if not epub_path:
+ retry = messagebox.askyesno(
+ "No File Selected",
+ "No EPUB file was selected.\n\n" +
+ "Do you want to continue the scan without word count analysis?",
+ icon='question'
+ )
+
+ if not retry:
+ self.append_log("⚠️ QA scan canceled.")
+ return
+ else:
+ qa_settings = qa_settings.copy()
+ qa_settings['check_word_count_ratio'] = False
+ self.append_log("ℹ️ Proceeding without word count analysis.")
+ epub_files_to_scan = []
+ else:
+ self.selected_epub_path = epub_path
+ self.config['last_epub_path'] = epub_path
+ self.save_config(show_message=False)
+ self.append_log(f"✅ Selected EPUB: {os.path.basename(epub_path)}")
+ epub_files_to_scan = [epub_path]
+ else: # Yes - Continue without word count
+ qa_settings = qa_settings.copy()
+ qa_settings['check_word_count_ratio'] = False
+ self.append_log("ℹ️ Proceeding without word count analysis.")
+ epub_files_to_scan = []
+ # Persist latest auto-search preference
+ try:
+ self.config['qa_auto_search_output'] = bool(self.qa_auto_search_output_var.get())
+ self.save_config(show_message=False)
+ except Exception:
+ pass
+
+ # Try to auto-detect output folders based on EPUB files
+ folders_to_scan = []
+ auto_search_enabled = self.config.get('qa_auto_search_output', True)
+ try:
+ if hasattr(self, 'qa_auto_search_output_var'):
+ auto_search_enabled = bool(self.qa_auto_search_output_var.get())
+ except Exception:
+ pass
+
+ # Debug output for scanning phase
+ if non_interactive:
+ self.append_log(f"📝 Debug: auto_search_enabled = {auto_search_enabled}")
+ self.append_log(f"📝 Debug: epub_files_to_scan = {len(epub_files_to_scan)} files")
+ self.append_log(f"📝 Debug: Will run folder detection = {auto_search_enabled and epub_files_to_scan}")
+
+ if auto_search_enabled and epub_files_to_scan:
+ # Process each EPUB file to find its corresponding output folder
+ self.append_log(f"🔍 DEBUG: Auto-search running with {len(epub_files_to_scan)} EPUB files")
+ for epub_path in epub_files_to_scan:
+ self.append_log(f"🔍 DEBUG: Processing EPUB: {epub_path}")
+ try:
+ epub_base = os.path.splitext(os.path.basename(epub_path))[0]
+ current_dir = os.getcwd()
+ script_dir = os.path.dirname(os.path.abspath(__file__))
+
+ self.append_log(f"🔍 DEBUG: EPUB base name: '{epub_base}'")
+ self.append_log(f"🔍 DEBUG: Current dir: {current_dir}")
+ self.append_log(f"🔍 DEBUG: Script dir: {script_dir}")
+
+ # Check the most common locations in order of priority
+ candidates = [
+ os.path.join(current_dir, epub_base), # current working directory
+ os.path.join(script_dir, epub_base), # src directory (where output typically goes)
+ os.path.join(current_dir, 'src', epub_base), # src subdirectory from current dir
+ ]
+
+ folder_found = None
+ for i, candidate in enumerate(candidates):
+ exists = os.path.isdir(candidate)
+ self.append_log(f" [{epub_base}] Checking candidate {i+1}: {candidate} - {'EXISTS' if exists else 'NOT FOUND'}")
+
+ if exists:
+ # Verify the folder actually contains HTML/XHTML files
+ try:
+ files = os.listdir(candidate)
+ html_files = [f for f in files if f.lower().endswith(('.html', '.xhtml', '.htm'))]
+ if html_files:
+ folder_found = candidate
+ self.append_log(f"📁 Auto-selected output folder for {epub_base}: {folder_found}")
+ self.append_log(f" Found {len(html_files)} HTML/XHTML files to scan")
+ break
+ else:
+ self.append_log(f" [{epub_base}] Folder exists but contains no HTML/XHTML files: {candidate}")
+ except Exception as e:
+ self.append_log(f" [{epub_base}] Error checking files in {candidate}: {e}")
+
+ if folder_found:
+ folders_to_scan.append(folder_found)
+ self.append_log(f"🔍 DEBUG: Added to folders_to_scan: {folder_found}")
+ else:
+ self.append_log(f" ⚠️ No output folder found for {epub_base}")
+
+ except Exception as e:
+ self.append_log(f" ❌ Error processing {epub_base}: {e}")
+
+ self.append_log(f"🔍 DEBUG: Final folders_to_scan: {folders_to_scan}")
+
+ # Fallback behavior - if no folders found through auto-detection
+ if not folders_to_scan:
+ if auto_search_enabled:
+ # Auto-search failed, offer manual selection as fallback
+ self.append_log("⚠️ Auto-search enabled but no matching output folder found")
+ self.append_log("📁 Falling back to manual folder selection...")
+
+ selected_folder = filedialog.askdirectory(title="Auto-search failed - Select Output Folder to Scan")
+ if not selected_folder:
+ self.append_log("⚠️ QA scan canceled - no folder selected.")
+ return
+
+ # Verify the selected folder contains scannable files
+ try:
+ files = os.listdir(selected_folder)
+ html_files = [f for f in files if f.lower().endswith(('.html', '.xhtml', '.htm'))]
+ if html_files:
+ folders_to_scan.append(selected_folder)
+ self.append_log(f"✓ Manual selection: {os.path.basename(selected_folder)} ({len(html_files)} HTML/XHTML files)")
+ else:
+ self.append_log(f"❌ Selected folder contains no HTML/XHTML files: {selected_folder}")
+ return
+ except Exception as e:
+ self.append_log(f"❌ Error checking selected folder: {e}")
+ return
+ if non_interactive:
+ # Add debug info for scanning phase
+ if epub_files_to_scan:
+ self.append_log(f"⚠️ Scanning phase: No matching output folders found for {len(epub_files_to_scan)} EPUB file(s)")
+ for epub_path in epub_files_to_scan:
+ epub_base = os.path.splitext(os.path.basename(epub_path))[0]
+ current_dir = os.getcwd()
+ expected_folder = os.path.join(current_dir, epub_base)
+ self.append_log(f" [{epub_base}] Expected: {expected_folder}")
+ self.append_log(f" [{epub_base}] Exists: {os.path.isdir(expected_folder)}")
+
+ # List actual folders in current directory for debugging
+ try:
+ current_dir = os.getcwd()
+ actual_folders = [d for d in os.listdir(current_dir) if os.path.isdir(os.path.join(current_dir, d)) and not d.startswith('.')]
+ if actual_folders:
+ self.append_log(f" Available folders: {', '.join(actual_folders[:10])}{'...' if len(actual_folders) > 10 else ''}")
+ except Exception:
+ pass
+ else:
+ self.append_log("⚠️ Scanning phase: No EPUB files available for folder detection")
+
+ self.append_log("⚠️ Skipping scan")
+ return
+
+ # Clean single folder selection - no messageboxes, no harassment
+ self.append_log("📁 Select folder to scan...")
+
+ folders_to_scan = []
+
+ # Simply select one folder - clean and simple
+ selected_folder = filedialog.askdirectory(title="Select Folder with HTML Files")
+ if not selected_folder:
+ self.append_log("⚠️ QA scan canceled - no folder selected.")
+ return
+
+ folders_to_scan.append(selected_folder)
+ self.append_log(f" ✓ Selected folder: {os.path.basename(selected_folder)}")
+ self.append_log(f"📁 Single folder scan mode - scanning: {os.path.basename(folders_to_scan[0])}")
+
+ mode = selected_mode_value
+
+ # Initialize epub_path for use in run_scan() function
+ # This ensures epub_path is always defined even when manually selecting folders
+ epub_path = None
+ if epub_files_to_scan:
+ epub_path = epub_files_to_scan[0] # Use first EPUB if multiple
+ self.append_log(f"📚 Using EPUB from scan list: {os.path.basename(epub_path)}")
+ elif hasattr(self, 'selected_epub_path') and self.selected_epub_path:
+ epub_path = self.selected_epub_path
+ self.append_log(f"📚 Using stored EPUB: {os.path.basename(epub_path)}")
+ elif primary_epub_path:
+ epub_path = primary_epub_path
+ self.append_log(f"📚 Using primary EPUB: {os.path.basename(epub_path)}")
+ else:
+ self.append_log("ℹ️ No EPUB file configured (word count analysis will be disabled if needed)")
+
+ # Initialize global selected_files that applies to single-folder scans
+ global_selected_files = None
+ if len(folders_to_scan) == 1 and preselected_files:
+ global_selected_files = list(preselected_files)
+ elif len(folders_to_scan) == 1 and (not non_interactive) and (not auto_search_enabled):
+ # Scan all files in the folder - no messageboxes asking about specific files
+ # User can set up file preselection if they need specific files
+ pass
+
+ # Log bulk scan start
+ if len(folders_to_scan) == 1:
+ self.append_log(f"🔍 Starting QA scan in {mode.upper()} mode for folder: {folders_to_scan[0]}")
+ else:
+ self.append_log(f"🔍 Starting bulk QA scan in {mode.upper()} mode for {len(folders_to_scan)} folders")
+
+ self.stop_requested = False
+
+ # Extract cache configuration from qa_settings
+ cache_config = {
+ 'enabled': qa_settings.get('cache_enabled', True),
+ 'auto_size': qa_settings.get('cache_auto_size', False),
+ 'show_stats': qa_settings.get('cache_show_stats', False),
+ 'sizes': {}
+ }
+
+ # Get individual cache sizes
+ for cache_name in ['normalize_text', 'similarity_ratio', 'content_hashes',
+ 'semantic_fingerprint', 'structural_signature', 'translation_artifacts']:
+ size = qa_settings.get(f'cache_{cache_name}', None)
+ if size is not None:
+ # Convert -1 to None for unlimited
+ cache_config['sizes'][cache_name] = None if size == -1 else size
+
+ # Create custom settings that includes cache config
+ custom_settings = {
+ 'qa_settings': qa_settings,
+ 'cache_config': cache_config,
+ 'log_cache_stats': qa_settings.get('cache_show_stats', False)
+ }
+
+ def run_scan():
+ # Update UI on the main thread
+ self.master.after(0, self.update_run_button)
+ self.master.after(0, lambda: self.qa_button.config(text="Stop Scan", command=self.stop_qa_scan, bootstyle="danger"))
+
+ try:
+ # Extract cache configuration from qa_settings
+ cache_config = {
+ 'enabled': qa_settings.get('cache_enabled', True),
+ 'auto_size': qa_settings.get('cache_auto_size', False),
+ 'show_stats': qa_settings.get('cache_show_stats', False),
+ 'sizes': {}
+ }
+
+ # Get individual cache sizes
+ for cache_name in ['normalize_text', 'similarity_ratio', 'content_hashes',
+ 'semantic_fingerprint', 'structural_signature', 'translation_artifacts']:
+ size = qa_settings.get(f'cache_{cache_name}', None)
+ if size is not None:
+ # Convert -1 to None for unlimited
+ cache_config['sizes'][cache_name] = None if size == -1 else size
+
+ # Configure the cache BEFORE calling scan_html_folder
+ from scan_html_folder import configure_qa_cache
+ configure_qa_cache(cache_config)
+
+ # Loop through all selected folders for bulk scanning
+ successful_scans = 0
+ failed_scans = 0
+
+ for i, current_folder in enumerate(folders_to_scan):
+ if self.stop_requested:
+ self.append_log(f"⚠️ Bulk scan stopped by user at folder {i+1}/{len(folders_to_scan)}")
+ break
+
+ folder_name = os.path.basename(current_folder)
+ if len(folders_to_scan) > 1:
+ self.append_log(f"\n📁 [{i+1}/{len(folders_to_scan)}] Scanning folder: {folder_name}")
+
+ # Determine the correct EPUB path for this specific folder
+ current_epub_path = epub_path
+ current_qa_settings = qa_settings.copy()
+
+ # For bulk scanning, try to find a matching EPUB for each folder
+ if len(folders_to_scan) > 1 and current_qa_settings.get('check_word_count_ratio', False):
+ # Try to find EPUB file matching this specific folder
+ folder_basename = os.path.basename(current_folder.rstrip('/\\'))
+ self.append_log(f" 🔍 Searching for EPUB matching folder: {folder_basename}")
+
+ # Look for EPUB in various locations
+ folder_parent = os.path.dirname(current_folder)
+
+ # Simple exact matching first, with minimal suffix handling
+ base_name = folder_basename
+
+ # Only handle the most common output suffixes
+ common_suffixes = ['_output', '_translated', '_en']
+ for suffix in common_suffixes:
+ if base_name.endswith(suffix):
+ base_name = base_name[:-len(suffix)]
+ break
+
+ # Simple EPUB search - focus on exact matching
+ search_names = [folder_basename] # Start with exact folder name
+ if base_name != folder_basename: # Add base name only if different
+ search_names.append(base_name)
+
+ potential_epub_paths = [
+ # Most common locations in order of priority
+ os.path.join(folder_parent, f"{folder_basename}.epub"), # Same directory as output folder
+ os.path.join(folder_parent, f"{base_name}.epub"), # Same directory with base name
+ os.path.join(current_folder, f"{folder_basename}.epub"), # Inside the output folder
+ os.path.join(current_folder, f"{base_name}.epub"), # Inside with base name
+ ]
+
+ # Find the first existing EPUB
+ folder_epub_path = None
+ for potential_path in potential_epub_paths:
+ if os.path.isfile(potential_path):
+ folder_epub_path = potential_path
+ if len(folders_to_scan) > 1:
+ self.append_log(f" Found matching EPUB: {os.path.basename(potential_path)}")
+ break
+
+ if folder_epub_path:
+ current_epub_path = folder_epub_path
+ if len(folders_to_scan) > 1: # Only log for bulk scans
+ self.append_log(f" 📖 Using EPUB: {os.path.basename(current_epub_path)}")
+ else:
+ # NO FALLBACK TO GLOBAL EPUB FOR BULK SCANS - This prevents wrong EPUB usage!
+ if len(folders_to_scan) > 1:
+ self.append_log(f" ⚠️ No matching EPUB found for folder '{folder_name}' - disabling word count analysis")
+ expected_names = ', '.join([f"{name}.epub" for name in search_names])
+ self.append_log(f" Expected EPUB names: {expected_names}")
+ current_epub_path = None
+ elif current_epub_path: # Single folder scan can use global EPUB
+ self.append_log(f" 📖 Using global EPUB: {os.path.basename(current_epub_path)} (no folder-specific EPUB found)")
+ else:
+ current_epub_path = None
+
+ # Disable word count analysis when no matching EPUB is found
+ if not current_epub_path:
+ current_qa_settings = current_qa_settings.copy()
+ current_qa_settings['check_word_count_ratio'] = False
+
+ # Check for EPUB/folder name mismatch
+ if current_epub_path and current_qa_settings.get('check_word_count_ratio', False) and current_qa_settings.get('warn_name_mismatch', True):
+ epub_name = os.path.splitext(os.path.basename(current_epub_path))[0]
+ folder_name_for_check = os.path.basename(current_folder.rstrip('/\\'))
+
+ if not check_epub_folder_match(epub_name, folder_name_for_check, current_qa_settings.get('custom_output_suffixes', '')):
+ if len(folders_to_scan) == 1:
+ # Interactive dialog for single folder scans
+ result = messagebox.askyesnocancel(
+ "EPUB/Folder Name Mismatch",
+ f"The source EPUB and output folder names don't match:\n\n" +
+ f"📖 EPUB: {epub_name}\n" +
+ f"📁 Folder: {folder_name_for_check}\n\n" +
+ "This might mean you're comparing the wrong files.\n" +
+ "Would you like to:\n" +
+ "• YES - Continue anyway (I'm sure these match)\n" +
+ "• NO - Select a different EPUB file\n" +
+ "• CANCEL - Cancel the scan",
+ icon='warning'
+ )
+
+ if result is None: # Cancel
+ self.append_log("⚠️ QA scan canceled due to EPUB/folder mismatch.")
+ return
+ elif result is False: # No - select different EPUB
+ new_epub_path = filedialog.askopenfilename(
+ title="Select Different Source EPUB File",
+ filetypes=[("EPUB files", "*.epub"), ("All files", "*.*")]
+ )
+
+ if new_epub_path:
+ current_epub_path = new_epub_path
+ self.selected_epub_path = new_epub_path
+ self.config['last_epub_path'] = new_epub_path
+ self.save_config(show_message=False)
+ self.append_log(f"✅ Updated EPUB: {os.path.basename(new_epub_path)}")
+ else:
+ proceed = messagebox.askyesno(
+ "No File Selected",
+ "No EPUB file was selected.\n\n" +
+ "Continue scan without word count analysis?",
+ icon='question'
+ )
+ if not proceed:
+ self.append_log("⚠️ QA scan canceled.")
+ return
+ else:
+ current_qa_settings = current_qa_settings.copy()
+ current_qa_settings['check_word_count_ratio'] = False
+ current_epub_path = None
+ self.append_log("ℹ️ Proceeding without word count analysis.")
+ # If YES, just continue with warning
+ else:
+ # For bulk scans, just warn and continue
+ self.append_log(f" ⚠️ Warning: EPUB/folder name mismatch - {epub_name} vs {folder_name_for_check}")
+
+ try:
+ # Determine selected_files for this folder
+ current_selected_files = None
+ if global_selected_files and len(folders_to_scan) == 1:
+ current_selected_files = global_selected_files
+
+ # Pass the QA settings to scan_html_folder
+ scan_html_folder(
+ current_folder,
+ log=self.append_log,
+ stop_flag=lambda: self.stop_requested,
+ mode=mode,
+ qa_settings=current_qa_settings,
+ epub_path=current_epub_path,
+ selected_files=current_selected_files
+ )
+
+ successful_scans += 1
+ if len(folders_to_scan) > 1:
+ self.append_log(f"✅ Folder '{folder_name}' scan completed successfully")
+
+ except Exception as folder_error:
+ failed_scans += 1
+ self.append_log(f"❌ Folder '{folder_name}' scan failed: {folder_error}")
+ if len(folders_to_scan) == 1:
+ # Re-raise for single folder scans
+ raise
+
+ # Final summary for bulk scans
+ if len(folders_to_scan) > 1:
+ self.append_log(f"\n📋 Bulk scan summary: {successful_scans} successful, {failed_scans} failed")
+
+ # If show_stats is enabled, log cache statistics
+ if qa_settings.get('cache_show_stats', False):
+ from scan_html_folder import get_cache_info
+ cache_stats = get_cache_info()
+ self.append_log("\n📊 Cache Performance Statistics:")
+ for name, info in cache_stats.items():
+ if info: # Check if info exists
+ hit_rate = info.hits / (info.hits + info.misses) if (info.hits + info.misses) > 0 else 0
+ self.append_log(f" {name}: {info.hits} hits, {info.misses} misses ({hit_rate:.1%} hit rate)")
+
+ if len(folders_to_scan) == 1:
+ self.append_log("✅ QA scan completed successfully.")
+ else:
+ self.append_log("✅ Bulk QA scan completed.")
+
+ except Exception as e:
+ self.append_log(f"❌ QA scan error: {e}")
+ self.append_log(f"Traceback: {traceback.format_exc()}")
+ finally:
+ # Clear thread/future refs so buttons re-enable
+ self.qa_thread = None
+ if hasattr(self, 'qa_future'):
+ try:
+ self.qa_future = None
+ except Exception:
+ pass
+ self.master.after(0, self.update_run_button)
+ self.master.after(0, lambda: self.qa_button.config(
+ text="QA Scan",
+ command=self.run_qa_scan,
+ bootstyle="warning",
+ state=tk.NORMAL if scan_html_folder else tk.DISABLED
+ ))
+
+ # Run via shared executor
+ self._ensure_executor()
+ if self.executor:
+ self.qa_future = self.executor.submit(run_scan)
+ # Ensure UI is refreshed when QA work completes
+ def _qa_done_callback(f):
+ try:
+ self.master.after(0, lambda: (setattr(self, 'qa_future', None), self.update_run_button()))
+ except Exception:
+ pass
+ try:
+ self.qa_future.add_done_callback(_qa_done_callback)
+ except Exception:
+ pass
+ else:
+ self.qa_thread = threading.Thread(target=run_scan, daemon=True)
+ self.qa_thread.start()
+
+ def show_qa_scanner_settings(self, parent_dialog, qa_settings):
+ """Show QA Scanner settings dialog using WindowManager properly"""
+ # Use setup_scrollable from WindowManager - NOT create_scrollable_dialog
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ parent_dialog,
+ "QA Scanner Settings",
+ width=800,
+ height=None, # Let WindowManager calculate optimal height
+ modal=True,
+ resizable=True,
+ max_width_ratio=0.9,
+ max_height_ratio=0.9
+ )
+
+ # Main settings frame
+ main_frame = tk.Frame(scrollable_frame, padx=30, pady=20)
+ main_frame.pack(fill=tk.BOTH, expand=True)
+
+ # Title
+ title_label = tk.Label(
+ main_frame,
+ text="QA Scanner Settings",
+ font=('Arial', 24, 'bold')
+ )
+ title_label.pack(pady=(0, 20))
+
+ # Foreign Character Settings Section
+ foreign_section = tk.LabelFrame(
+ main_frame,
+ text="Foreign Character Detection",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ foreign_section.pack(fill=tk.X, pady=(0, 20))
+
+ # Threshold setting
+ threshold_frame = tk.Frame(foreign_section)
+ threshold_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(
+ threshold_frame,
+ text="Minimum foreign characters to flag:",
+ font=('Arial', 10)
+ ).pack(side=tk.LEFT)
+
+ threshold_var = tk.IntVar(value=qa_settings.get('foreign_char_threshold', 10))
+ threshold_spinbox = tb.Spinbox(
+ threshold_frame,
+ from_=0,
+ to=1000,
+ textvariable=threshold_var,
+ width=10,
+ bootstyle="primary"
+ )
+ threshold_spinbox.pack(side=tk.LEFT, padx=(10, 0))
+
+ # Disable mousewheel scrolling on spinbox
+ UIHelper.disable_spinbox_mousewheel(threshold_spinbox)
+
+ tk.Label(
+ threshold_frame,
+ text="(0 = always flag, higher = more tolerant)",
+ font=('Arial', 9),
+ fg='gray'
+ ).pack(side=tk.LEFT, padx=(10, 0))
+
+ # Excluded characters - using UIHelper for scrollable text
+ excluded_frame = tk.Frame(foreign_section)
+ excluded_frame.pack(fill=tk.X, pady=(10, 0))
+
+ tk.Label(
+ excluded_frame,
+ text="Additional characters to exclude from detection:",
+ font=('Arial', 10)
+ ).pack(anchor=tk.W)
+
+ # Use regular Text widget with manual scroll setup instead of ScrolledText
+ excluded_text_frame = tk.Frame(excluded_frame)
+ excluded_text_frame.pack(fill=tk.X, pady=(5, 0))
+
+ excluded_text = tk.Text(
+ excluded_text_frame,
+ height=7,
+ width=60,
+ font=('Consolas', 10),
+ wrap=tk.WORD,
+ undo=True
+ )
+ excluded_text.pack(side=tk.LEFT, fill=tk.X, expand=True)
+
+ # Add scrollbar manually
+ excluded_scrollbar = ttk.Scrollbar(excluded_text_frame, orient="vertical", command=excluded_text.yview)
+ excluded_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+ excluded_text.configure(yscrollcommand=excluded_scrollbar.set)
+
+ # Setup undo/redo for the text widget
+ UIHelper.setup_text_undo_redo(excluded_text)
+
+ excluded_text.insert(1.0, qa_settings.get('excluded_characters', ''))
+
+ tk.Label(
+ excluded_frame,
+ text="Enter characters separated by spaces (e.g., ™ © ® • …)",
+ font=('Arial', 9),
+ fg='gray'
+ ).pack(anchor=tk.W)
+
+ # Detection Options Section
+ detection_section = tk.LabelFrame(
+ main_frame,
+ text="Detection Options",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ detection_section.pack(fill=tk.X, pady=(0, 20))
+
+ # Checkboxes for detection options
+ check_encoding_var = tk.BooleanVar(value=qa_settings.get('check_encoding_issues', False))
+ check_repetition_var = tk.BooleanVar(value=qa_settings.get('check_repetition', True))
+ check_artifacts_var = tk.BooleanVar(value=qa_settings.get('check_translation_artifacts', False))
+ check_glossary_var = tk.BooleanVar(value=qa_settings.get('check_glossary_leakage', True))
+
+ tb.Checkbutton(
+ detection_section,
+ text="Check for encoding issues (�, □, ◇)",
+ variable=check_encoding_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=2)
+
+ tb.Checkbutton(
+ detection_section,
+ text="Check for excessive repetition",
+ variable=check_repetition_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=2)
+
+ tb.Checkbutton(
+ detection_section,
+ text="Check for translation artifacts (MTL notes, watermarks)",
+ variable=check_artifacts_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=2)
+ tb.Checkbutton(
+ detection_section,
+ text="Check for glossary leakage (raw glossary entries in translation)",
+ variable=check_glossary_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=2)
+
+ # File Processing Section
+ file_section = tk.LabelFrame(
+ main_frame,
+ text="File Processing",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ file_section.pack(fill=tk.X, pady=(0, 20))
+
+ # Minimum file length
+ min_length_frame = tk.Frame(file_section)
+ min_length_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(
+ min_length_frame,
+ text="Minimum file length (characters):",
+ font=('Arial', 10)
+ ).pack(side=tk.LEFT)
+
+ min_length_var = tk.IntVar(value=qa_settings.get('min_file_length', 0))
+ min_length_spinbox = tb.Spinbox(
+ min_length_frame,
+ from_=0,
+ to=10000,
+ textvariable=min_length_var,
+ width=10,
+ bootstyle="primary"
+ )
+ min_length_spinbox.pack(side=tk.LEFT, padx=(10, 0))
+
+ # Disable mousewheel scrolling on spinbox
+ UIHelper.disable_spinbox_mousewheel(min_length_spinbox)
+
+ # Add a separator
+ separator = ttk.Separator(main_frame, orient='horizontal')
+ separator.pack(fill=tk.X, pady=15)
+
+ # Word Count Cross-Reference Section
+ wordcount_section = tk.LabelFrame(
+ main_frame,
+ text="Word Count Analysis",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ wordcount_section.pack(fill=tk.X, pady=(0, 20))
+
+ check_word_count_var = tk.BooleanVar(value=qa_settings.get('check_word_count_ratio', False))
+ tb.Checkbutton(
+ wordcount_section,
+ text="Cross-reference word counts with original EPUB",
+ variable=check_word_count_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(
+ wordcount_section,
+ text="Compares word counts between original and translated files to detect missing content.\n" +
+ "Accounts for typical expansion ratios when translating from CJK to English.",
+ wraplength=700,
+ justify=tk.LEFT,
+ fg='gray'
+ ).pack(anchor=tk.W, padx=(20, 0))
+
+ # Show current EPUB status and allow selection
+ epub_frame = tk.Frame(wordcount_section)
+ epub_frame.pack(anchor=tk.W, pady=(10, 5))
+
+ # Get EPUBs from actual current selection (not stored config)
+ current_epub_files = []
+ if hasattr(self, 'selected_files') and self.selected_files:
+ current_epub_files = [f for f in self.selected_files if f.lower().endswith('.epub')]
+
+ if len(current_epub_files) > 1:
+ # Multiple EPUBs in current selection
+ primary_epub = os.path.basename(current_epub_files[0])
+ status_text = f"📖 {len(current_epub_files)} EPUB files selected (Primary: {primary_epub})"
+ status_color = 'green'
+ elif len(current_epub_files) == 1:
+ # Single EPUB in current selection
+ status_text = f"📖 Current EPUB: {os.path.basename(current_epub_files[0])}"
+ status_color = 'green'
+ else:
+ # No EPUB files in current selection
+ status_text = "📖 No EPUB in current selection"
+ status_color = 'orange'
+
+ status_label = tk.Label(
+ epub_frame,
+ text=status_text,
+ fg=status_color,
+ font=('Arial', 10)
+ )
+ status_label.pack(side=tk.LEFT)
+
+ def select_epub_for_qa():
+ epub_path = filedialog.askopenfilename(
+ title="Select Source EPUB File",
+ filetypes=[("EPUB files", "*.epub"), ("All files", "*.*")],
+ parent=dialog
+ )
+ if epub_path:
+ self.selected_epub_path = epub_path
+ self.config['last_epub_path'] = epub_path
+ self.save_config(show_message=False)
+
+ # Clear multiple EPUB tracking when manually selecting a single EPUB
+ if hasattr(self, 'selected_epub_files'):
+ self.selected_epub_files = [epub_path]
+
+ status_label.config(
+ text=f"📖 Current EPUB: {os.path.basename(epub_path)}",
+ fg='green'
+ )
+ self.append_log(f"✅ Selected EPUB for QA: {os.path.basename(epub_path)}")
+
+ tk.Button(
+ epub_frame,
+ text="Select EPUB",
+ command=select_epub_for_qa,
+ font=('Arial', 9)
+ ).pack(side=tk.LEFT, padx=(10, 0))
+
+ # Add option to disable mismatch warning
+ warn_mismatch_var = tk.BooleanVar(value=qa_settings.get('warn_name_mismatch', True))
+ tb.Checkbutton(
+ wordcount_section,
+ text="Warn when EPUB and folder names don't match",
+ variable=warn_mismatch_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=(10, 5))
+
+ # Additional Checks Section
+ additional_section = tk.LabelFrame(
+ main_frame,
+ text="Additional Checks",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ additional_section.pack(fill=tk.X, pady=(20, 0))
+
+ # Multiple headers check
+ check_multiple_headers_var = tk.BooleanVar(value=qa_settings.get('check_multiple_headers', True))
+ tb.Checkbutton(
+ additional_section,
+ text="Detect files with 2 or more headers (h1-h6 tags)",
+ variable=check_multiple_headers_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=(5, 5))
+
+ tk.Label(
+ additional_section,
+ text="Identifies files that may have been incorrectly split or merged.\n" +
+ "Useful for detecting chapters that contain multiple sections.",
+ wraplength=700,
+ justify=tk.LEFT,
+ fg='gray'
+ ).pack(anchor=tk.W, padx=(20, 0))
+
+ # Missing HTML tag check
+ html_tag_frame = tk.Frame(additional_section)
+ html_tag_frame.pack(fill=tk.X, pady=(10, 5))
+
+ check_missing_html_tag_var = tk.BooleanVar(value=qa_settings.get('check_missing_html_tag', True))
+ check_missing_html_tag_check = tb.Checkbutton(
+ html_tag_frame,
+ text="Flag HTML files with missing tag",
+ variable=check_missing_html_tag_var,
+ bootstyle="primary"
+ )
+ check_missing_html_tag_check.pack(side=tk.LEFT)
+
+ tk.Label(
+ html_tag_frame,
+ text="(Checks if HTML files have proper structure)",
+ font=('Arial', 9),
+ foreground='gray'
+ ).pack(side=tk.LEFT, padx=(10, 0))
+
+ # Invalid nesting check (separate toggle)
+ check_invalid_nesting_var = tk.BooleanVar(value=qa_settings.get('check_invalid_nesting', False))
+ tb.Checkbutton(
+ additional_section,
+ text="Check for invalid tag nesting",
+ variable=check_invalid_nesting_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W, pady=(5, 5))
+
+ # NEW: Paragraph Structure Check
+ paragraph_section_frame = tk.Frame(additional_section)
+ paragraph_section_frame.pack(fill=tk.X, pady=(15, 5))
+
+ # Separator line
+ ttk.Separator(paragraph_section_frame, orient='horizontal').pack(fill=tk.X, pady=(0, 10))
+
+ # Checkbox for paragraph structure check
+ check_paragraph_structure_var = tk.BooleanVar(value=qa_settings.get('check_paragraph_structure', True))
+ paragraph_check = tb.Checkbutton(
+ paragraph_section_frame,
+ text="Check for insufficient paragraph tags",
+ variable=check_paragraph_structure_var,
+ bootstyle="primary"
+ )
+ paragraph_check.pack(anchor=tk.W)
+
+ # Threshold setting frame
+ threshold_container = tk.Frame(paragraph_section_frame)
+ threshold_container.pack(fill=tk.X, pady=(10, 5), padx=(20, 0))
+
+ tk.Label(
+ threshold_container,
+ text="Minimum text in tags:",
+ font=('Arial', 10)
+ ).pack(side=tk.LEFT)
+
+ # Get current threshold value (default 30%)
+ current_threshold = int(qa_settings.get('paragraph_threshold', 0.3) * 100)
+ paragraph_threshold_var = tk.IntVar(value=current_threshold)
+
+ # Spinbox for threshold
+ paragraph_threshold_spinbox = tb.Spinbox(
+ threshold_container,
+ from_=0,
+ to=100,
+ textvariable=paragraph_threshold_var,
+ width=8,
+ bootstyle="primary"
+ )
+ paragraph_threshold_spinbox.pack(side=tk.LEFT, padx=(10, 5))
+
+ # Disable mousewheel scrolling on the spinbox
+ UIHelper.disable_spinbox_mousewheel(paragraph_threshold_spinbox)
+
+ tk.Label(
+ threshold_container,
+ text="%",
+ font=('Arial', 10)
+ ).pack(side=tk.LEFT)
+
+ # Threshold value label
+ threshold_value_label = tk.Label(
+ threshold_container,
+ text=f"(currently {current_threshold}%)",
+ font=('Arial', 9),
+ fg='gray'
+ )
+ threshold_value_label.pack(side=tk.LEFT, padx=(10, 0))
+
+ # Update label when spinbox changes
+ def update_threshold_label(*args):
+ try:
+ value = paragraph_threshold_var.get()
+ threshold_value_label.config(text=f"(currently {value}%)")
+ except (tk.TclError, ValueError):
+ # Handle empty or invalid input
+ threshold_value_label.config(text="(currently --%)")
+ paragraph_threshold_var.trace('w', update_threshold_label)
+
+ # Description
+ tk.Label(
+ paragraph_section_frame,
+ text="Detects HTML files where text content is not properly wrapped in paragraph tags.\n" +
+ "Files with less than the specified percentage of text in
tags will be flagged.\n" +
+ "Also checks for large blocks of unwrapped text directly in the body element.",
+ wraplength=700,
+ justify=tk.LEFT,
+ fg='gray'
+ ).pack(anchor=tk.W, padx=(20, 0), pady=(5, 0))
+
+ # Enable/disable threshold setting based on checkbox
+ def toggle_paragraph_threshold(*args):
+ if check_paragraph_structure_var.get():
+ paragraph_threshold_spinbox.config(state='normal')
+ else:
+ paragraph_threshold_spinbox.config(state='disabled')
+
+ check_paragraph_structure_var.trace('w', toggle_paragraph_threshold)
+ toggle_paragraph_threshold() # Set initial state
+
+ # Report Settings Section
+ report_section = tk.LabelFrame(
+ main_frame,
+ text="Report Settings",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ report_section.pack(fill=tk.X, pady=(0, 20))
+
+ # Cache Settings Section
+ cache_section = tk.LabelFrame(
+ main_frame,
+ text="Performance Cache Settings",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ cache_section.pack(fill=tk.X, pady=(0, 20))
+
+ # Enable cache checkbox
+ cache_enabled_var = tk.BooleanVar(value=qa_settings.get('cache_enabled', True))
+ cache_checkbox = tb.Checkbutton(
+ cache_section,
+ text="Enable performance cache (speeds up duplicate detection)",
+ variable=cache_enabled_var,
+ bootstyle="primary"
+ )
+ cache_checkbox.pack(anchor=tk.W, pady=(0, 10))
+
+ # Cache size settings frame
+ cache_sizes_frame = tk.Frame(cache_section)
+ cache_sizes_frame.pack(fill=tk.X, padx=(20, 0))
+
+ # Description
+ tk.Label(
+ cache_sizes_frame,
+ text="Cache sizes (0 = disabled, -1 = unlimited):",
+ font=('Arial', 10)
+ ).pack(anchor=tk.W, pady=(0, 5))
+
+ # Cache size variables
+ cache_vars = {}
+ cache_defaults = {
+ 'normalize_text': 10000,
+ 'similarity_ratio': 20000,
+ 'content_hashes': 5000,
+ 'semantic_fingerprint': 2000,
+ 'structural_signature': 2000,
+ 'translation_artifacts': 1000
+ }
+
+ # Create input fields for each cache type
+ for cache_name, default_value in cache_defaults.items():
+ row_frame = tk.Frame(cache_sizes_frame)
+ row_frame.pack(fill=tk.X, pady=2)
+
+ # Label
+ label_text = cache_name.replace('_', ' ').title() + ":"
+ tk.Label(
+ row_frame,
+ text=label_text,
+ width=25,
+ anchor='w',
+ font=('Arial', 9)
+ ).pack(side=tk.LEFT)
+
+ # Get current value
+ current_value = qa_settings.get(f'cache_{cache_name}', default_value)
+ cache_var = tk.IntVar(value=current_value)
+ cache_vars[cache_name] = cache_var
+
+ # Spinbox
+ spinbox = tb.Spinbox(
+ row_frame,
+ from_=-1,
+ to=50000,
+ textvariable=cache_var,
+ width=10,
+ bootstyle="primary"
+ )
+ spinbox.pack(side=tk.LEFT, padx=(0, 10))
+
+ # Disable mousewheel scrolling
+ UIHelper.disable_spinbox_mousewheel(spinbox)
+
+ # Quick preset buttons
+ button_frame = tk.Frame(row_frame)
+ button_frame.pack(side=tk.LEFT)
+
+ tk.Button(
+ button_frame,
+ text="Off",
+ width=4,
+ font=('Arial', 8),
+ command=lambda v=cache_var: v.set(0)
+ ).pack(side=tk.LEFT, padx=1)
+
+ tk.Button(
+ button_frame,
+ text="Small",
+ width=5,
+ font=('Arial', 8),
+ command=lambda v=cache_var: v.set(1000)
+ ).pack(side=tk.LEFT, padx=1)
+
+ tk.Button(
+ button_frame,
+ text="Medium",
+ width=7,
+ font=('Arial', 8),
+ command=lambda v=cache_var, d=default_value: v.set(d)
+ ).pack(side=tk.LEFT, padx=1)
+
+ tk.Button(
+ button_frame,
+ text="Large",
+ width=5,
+ font=('Arial', 8),
+ command=lambda v=cache_var, d=default_value: v.set(d * 2)
+ ).pack(side=tk.LEFT, padx=1)
+
+ tk.Button(
+ button_frame,
+ text="Max",
+ width=4,
+ font=('Arial', 8),
+ command=lambda v=cache_var: v.set(-1)
+ ).pack(side=tk.LEFT, padx=1)
+
+ # Enable/disable cache size controls based on checkbox
+ def toggle_cache_controls(*args):
+ state = 'normal' if cache_enabled_var.get() else 'disabled'
+ for widget in cache_sizes_frame.winfo_children():
+ if isinstance(widget, tk.Frame):
+ for child in widget.winfo_children():
+ if isinstance(child, (tb.Spinbox, tk.Button)):
+ child.config(state=state)
+
+ cache_enabled_var.trace('w', toggle_cache_controls)
+ toggle_cache_controls() # Set initial state
+
+ # Auto-size cache option
+ auto_size_frame = tk.Frame(cache_section)
+ auto_size_frame.pack(fill=tk.X, pady=(10, 5))
+
+ auto_size_var = tk.BooleanVar(value=qa_settings.get('cache_auto_size', False))
+ auto_size_check = tb.Checkbutton(
+ auto_size_frame,
+ text="Auto-size caches based on available RAM",
+ variable=auto_size_var,
+ bootstyle="primary"
+ )
+ auto_size_check.pack(side=tk.LEFT)
+
+ tk.Label(
+ auto_size_frame,
+ text="(overrides manual settings)",
+ font=('Arial', 9),
+ fg='gray'
+ ).pack(side=tk.LEFT, padx=(10, 0))
+
+ # Cache statistics display
+ stats_frame = tk.Frame(cache_section)
+ stats_frame.pack(fill=tk.X, pady=(10, 0))
+
+ show_stats_var = tk.BooleanVar(value=qa_settings.get('cache_show_stats', False))
+ tb.Checkbutton(
+ stats_frame,
+ text="Show cache hit/miss statistics after scan",
+ variable=show_stats_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W)
+
+ # Info about cache
+ tk.Label(
+ cache_section,
+ text="Larger cache sizes use more memory but improve performance for:\n" +
+ "• Large datasets (100+ files)\n" +
+ "• AI Hunter mode (all file pairs compared)\n" +
+ "• Repeated scans of the same folder",
+ wraplength=700,
+ justify=tk.LEFT,
+ fg='gray',
+ font=('Arial', 9)
+ ).pack(anchor=tk.W, padx=(20, 0), pady=(10, 0))
+
+ # AI Hunter Performance Section
+ ai_hunter_section = tk.LabelFrame(
+ main_frame,
+ text="AI Hunter Performance Settings",
+ font=('Arial', 12, 'bold'),
+ padx=20,
+ pady=15
+ )
+ ai_hunter_section.pack(fill=tk.X, pady=(0, 20))
+
+ # Description
+ tk.Label(
+ ai_hunter_section,
+ text="AI Hunter mode performs exhaustive duplicate detection by comparing every file pair.\n" +
+ "Parallel processing can significantly speed up this process on multi-core systems.",
+ wraplength=700,
+ justify=tk.LEFT,
+ fg='gray',
+ font=('Arial', 9)
+ ).pack(anchor=tk.W, pady=(0, 10))
+
+ # Parallel workers setting
+ workers_frame = tk.Frame(ai_hunter_section)
+ workers_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(
+ workers_frame,
+ text="Maximum parallel workers:",
+ font=('Arial', 10)
+ ).pack(side=tk.LEFT)
+
+ # Get current value from AI Hunter config
+ ai_hunter_config = self.config.get('ai_hunter_config', {})
+ current_max_workers = ai_hunter_config.get('ai_hunter_max_workers', 1)
+
+ ai_hunter_workers_var = tk.IntVar(value=current_max_workers)
+ workers_spinbox = tb.Spinbox(
+ workers_frame,
+ from_=0,
+ to=64,
+ textvariable=ai_hunter_workers_var,
+ width=10,
+ bootstyle="primary"
+ )
+ workers_spinbox.pack(side=tk.LEFT, padx=(10, 0))
+
+ # Disable mousewheel scrolling on spinbox
+ UIHelper.disable_spinbox_mousewheel(workers_spinbox)
+
+ # CPU count display
+ import multiprocessing
+ cpu_count = multiprocessing.cpu_count()
+ cpu_label = tk.Label(
+ workers_frame,
+ text=f"(0 = use all {cpu_count} cores)",
+ font=('Arial', 9),
+ fg='gray'
+ )
+ cpu_label.pack(side=tk.LEFT, padx=(10, 0))
+
+ # Quick preset buttons
+ preset_frame = tk.Frame(ai_hunter_section)
+ preset_frame.pack(fill=tk.X)
+
+ tk.Label(
+ preset_frame,
+ text="Quick presets:",
+ font=('Arial', 9)
+ ).pack(side=tk.LEFT, padx=(0, 10))
+
+ tk.Button(
+ preset_frame,
+ text=f"All cores ({cpu_count})",
+ font=('Arial', 9),
+ command=lambda: ai_hunter_workers_var.set(0)
+ ).pack(side=tk.LEFT, padx=2)
+
+ tk.Button(
+ preset_frame,
+ text="Half cores",
+ font=('Arial', 9),
+ command=lambda: ai_hunter_workers_var.set(max(1, cpu_count // 2))
+ ).pack(side=tk.LEFT, padx=2)
+
+ tk.Button(
+ preset_frame,
+ text="4 cores",
+ font=('Arial', 9),
+ command=lambda: ai_hunter_workers_var.set(4)
+ ).pack(side=tk.LEFT, padx=2)
+
+ tk.Button(
+ preset_frame,
+ text="8 cores",
+ font=('Arial', 9),
+ command=lambda: ai_hunter_workers_var.set(8)
+ ).pack(side=tk.LEFT, padx=2)
+
+ tk.Button(
+ preset_frame,
+ text="Single thread",
+ font=('Arial', 9),
+ command=lambda: ai_hunter_workers_var.set(1)
+ ).pack(side=tk.LEFT, padx=2)
+
+ # Performance tips
+ tips_text = "Performance Tips:\n" + \
+ f"• Your system has {cpu_count} CPU cores available\n" + \
+ "• Using all cores provides maximum speed but may slow other applications\n" + \
+ "• 4-8 cores usually provides good balance of speed and system responsiveness\n" + \
+ "• Single thread (1) disables parallel processing for debugging"
+
+ tk.Label(
+ ai_hunter_section,
+ text=tips_text,
+ wraplength=700,
+ justify=tk.LEFT,
+ fg='gray',
+ font=('Arial', 9)
+ ).pack(anchor=tk.W, padx=(20, 0), pady=(10, 0))
+
+ # Report format
+ format_frame = tk.Frame(report_section)
+ format_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(
+ format_frame,
+ text="Report format:",
+ font=('Arial', 10)
+ ).pack(side=tk.LEFT)
+
+ format_var = tk.StringVar(value=qa_settings.get('report_format', 'detailed'))
+ format_options = [
+ ("Summary only", "summary"),
+ ("Detailed (recommended)", "detailed"),
+ ("Verbose (all data)", "verbose")
+ ]
+
+ for idx, (text, value) in enumerate(format_options):
+ rb = tb.Radiobutton(
+ format_frame,
+ text=text,
+ variable=format_var,
+ value=value,
+ bootstyle="primary"
+ )
+ rb.pack(side=tk.LEFT, padx=(10 if idx == 0 else 5, 0))
+
+ # Auto-save report
+ auto_save_var = tk.BooleanVar(value=qa_settings.get('auto_save_report', True))
+ tb.Checkbutton(
+ report_section,
+ text="Automatically save report after scan",
+ variable=auto_save_var,
+ bootstyle="primary"
+ ).pack(anchor=tk.W)
+
+ # Buttons
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(20, 0))
+ button_inner = tk.Frame(button_frame)
+ button_inner.pack()
+
+ def save_settings():
+ """Save QA scanner settings"""
+ try:
+ qa_settings['foreign_char_threshold'] = threshold_var.get()
+ qa_settings['excluded_characters'] = excluded_text.get(1.0, tk.END).strip()
+ qa_settings['check_encoding_issues'] = check_encoding_var.get()
+ qa_settings['check_repetition'] = check_repetition_var.get()
+ qa_settings['check_translation_artifacts'] = check_artifacts_var.get()
+ qa_settings['check_glossary_leakage'] = check_glossary_var.get()
+ qa_settings['min_file_length'] = min_length_var.get()
+ qa_settings['report_format'] = format_var.get()
+ qa_settings['auto_save_report'] = auto_save_var.get()
+ qa_settings['check_word_count_ratio'] = check_word_count_var.get()
+ qa_settings['check_multiple_headers'] = check_multiple_headers_var.get()
+ qa_settings['warn_name_mismatch'] = warn_mismatch_var.get()
+ qa_settings['check_missing_html_tag'] = check_missing_html_tag_var.get()
+ qa_settings['check_paragraph_structure'] = check_paragraph_structure_var.get()
+ qa_settings['check_invalid_nesting'] = check_invalid_nesting_var.get()
+
+ # Save cache settings
+ qa_settings['cache_enabled'] = cache_enabled_var.get()
+ qa_settings['cache_auto_size'] = auto_size_var.get()
+ qa_settings['cache_show_stats'] = show_stats_var.get()
+
+ # Save individual cache sizes
+ for cache_name, cache_var in cache_vars.items():
+ qa_settings[f'cache_{cache_name}'] = cache_var.get()
+
+ if 'ai_hunter_config' not in self.config:
+ self.config['ai_hunter_config'] = {}
+ self.config['ai_hunter_config']['ai_hunter_max_workers'] = ai_hunter_workers_var.get()
+
+ # Validate and save paragraph threshold
+ try:
+ threshold_value = paragraph_threshold_var.get()
+ if 0 <= threshold_value <= 100:
+ qa_settings['paragraph_threshold'] = threshold_value / 100.0 # Convert to decimal
+ else:
+ raise ValueError("Threshold must be between 0 and 100")
+ except (tk.TclError, ValueError) as e:
+ # Default to 30% if invalid
+ qa_settings['paragraph_threshold'] = 0.3
+ self.append_log("⚠️ Invalid paragraph threshold, using default 30%")
+
+
+ # Save to main config
+ self.config['qa_scanner_settings'] = qa_settings
+
+ # Call save_config with show_message=False to avoid the error
+ self.save_config(show_message=False)
+
+ self.append_log("✅ QA Scanner settings saved")
+ dialog._cleanup_scrolling() # Clean up scrolling bindings
+ dialog.destroy()
+
+ except Exception as e:
+ self.append_log(f"❌ Error saving QA settings: {str(e)}")
+ messagebox.showerror("Error", f"Failed to save settings: {str(e)}")
+
+ def reset_defaults():
+ """Reset to default settings"""
+ result = messagebox.askyesno(
+ "Reset to Defaults",
+ "Are you sure you want to reset all settings to defaults?",
+ parent=dialog
+ )
+ if result:
+ threshold_var.set(10)
+ excluded_text.delete(1.0, tk.END)
+ check_encoding_var.set(False)
+ check_repetition_var.set(True)
+ check_artifacts_var.set(False)
+
+ check_glossary_var.set(True)
+ min_length_var.set(0)
+ format_var.set('detailed')
+ auto_save_var.set(True)
+ check_word_count_var.set(False)
+ check_multiple_headers_var.set(True)
+ warn_mismatch_var.set(False)
+ check_missing_html_tag_var.set(True)
+ check_paragraph_structure_var.set(True)
+ check_invalid_nesting_var.set(False)
+ paragraph_threshold_var.set(30) # 30% default
+ paragraph_threshold_var.set(30) # 30% default
+
+ # Reset cache settings
+ cache_enabled_var.set(True)
+ auto_size_var.set(False)
+ show_stats_var.set(False)
+
+ # Reset cache sizes to defaults
+ for cache_name, default_value in cache_defaults.items():
+ cache_vars[cache_name].set(default_value)
+
+ ai_hunter_workers_var.set(1)
+
+ # Create buttons using ttkbootstrap styles
+ save_btn = tb.Button(
+ button_inner,
+ text="Save Settings",
+ command=save_settings,
+ bootstyle="success",
+ width=15
+ )
+ save_btn.pack(side=tk.LEFT, padx=5)
+
+ reset_btn = tb.Button(
+ button_inner,
+ text="Reset Defaults",
+ command=reset_defaults,
+ bootstyle="warning",
+ width=15
+ )
+ reset_btn.pack(side=tk.RIGHT, padx=(5, 0))
+
+ cancel_btn = tb.Button(
+ button_inner,
+ text="Cancel",
+ command=lambda: [dialog._cleanup_scrolling(), dialog.destroy()],
+ bootstyle="secondary",
+ width=15
+ )
+ cancel_btn.pack(side=tk.RIGHT)
+
+ # Use WindowManager's auto_resize_dialog to properly size the window
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.9, max_height_ratio=0.85)
+
+ # Handle window close - setup_scrollable adds _cleanup_scrolling method
+ dialog.protocol("WM_DELETE_WINDOW", lambda: [dialog._cleanup_scrolling(), dialog.destroy()])
+
+ def toggle_token_limit(self):
+ """Toggle whether the token-limit entry is active or not."""
+ if not self.token_limit_disabled:
+ self.token_limit_entry.config(state=tk.DISABLED)
+ self.toggle_token_btn.config(text="Enable Input Token Limit", bootstyle="success-outline")
+ self.append_log("⚠️ Input token limit disabled - both translation and glossary extraction will process chapters of any size.")
+ self.token_limit_disabled = True
+ else:
+ self.token_limit_entry.config(state=tk.NORMAL)
+ if not self.token_limit_entry.get().strip():
+ self.token_limit_entry.insert(0, str(self.config.get('token_limit', 1000000)))
+ self.toggle_token_btn.config(text="Disable Input Token Limit", bootstyle="danger-outline")
+ self.append_log(f"✅ Input token limit enabled: {self.token_limit_entry.get()} tokens (applies to both translation and glossary extraction)")
+ self.token_limit_disabled = False
+
+ def update_run_button(self):
+ """Switch Run↔Stop depending on whether a process is active."""
+ translation_running = (
+ (hasattr(self, 'translation_thread') and self.translation_thread and self.translation_thread.is_alive()) or
+ (hasattr(self, 'translation_future') and self.translation_future and not self.translation_future.done())
+ )
+ glossary_running = (
+ (hasattr(self, 'glossary_thread') and self.glossary_thread and self.glossary_thread.is_alive()) or
+ (hasattr(self, 'glossary_future') and self.glossary_future and not self.glossary_future.done())
+ )
+ qa_running = (
+ (hasattr(self, 'qa_thread') and self.qa_thread and self.qa_thread.is_alive()) or
+ (hasattr(self, 'qa_future') and self.qa_future and not self.qa_future.done())
+ )
+ epub_running = (
+ (hasattr(self, 'epub_thread') and self.epub_thread and self.epub_thread.is_alive()) or
+ (hasattr(self, 'epub_future') and self.epub_future and not self.epub_future.done())
+ )
+
+ any_process_running = translation_running or glossary_running or qa_running or epub_running
+
+ # Translation button
+ if translation_running:
+ self.run_button.config(text="Stop Translation", command=self.stop_translation,
+ bootstyle="danger", state=tk.NORMAL)
+ else:
+ self.run_button.config(text="Run Translation", command=self.run_translation_thread,
+ bootstyle="success", state=tk.NORMAL if translation_main and not any_process_running else tk.DISABLED)
+
+ # Glossary button
+ if hasattr(self, 'glossary_button'):
+ if glossary_running:
+ self.glossary_button.config(text="Stop Glossary", command=self.stop_glossary_extraction,
+ bootstyle="danger", state=tk.NORMAL)
+ else:
+ self.glossary_button.config(text="Extract Glossary", command=self.run_glossary_extraction_thread,
+ bootstyle="warning", state=tk.NORMAL if glossary_main and not any_process_running else tk.DISABLED)
+
+ # EPUB button
+ if hasattr(self, 'epub_button'):
+ if epub_running:
+ self.epub_button.config(text="Stop EPUB", command=self.stop_epub_converter,
+ bootstyle="danger", state=tk.NORMAL)
+ else:
+ self.epub_button.config(text="EPUB Converter", command=self.epub_converter,
+ bootstyle="info", state=tk.NORMAL if fallback_compile_epub and not any_process_running else tk.DISABLED)
+
+ # QA button
+ if hasattr(self, 'qa_button'):
+ self.qa_button.config(state=tk.NORMAL if scan_html_folder and not any_process_running else tk.DISABLED)
+ if qa_running:
+ self.qa_button.config(text="Stop Scan", command=self.stop_qa_scan,
+ bootstyle="danger", state=tk.NORMAL)
+ else:
+ self.qa_button.config(text="QA Scan", command=self.run_qa_scan,
+ bootstyle="warning", state=tk.NORMAL if scan_html_folder and not any_process_running else tk.DISABLED)
+
+ def stop_translation(self):
+ """Stop translation while preserving loaded file"""
+ current_file = self.entry_epub.get() if hasattr(self, 'entry_epub') else None
+
+ # Set environment variable to suppress multi-key logging
+ os.environ['TRANSLATION_CANCELLED'] = '1'
+
+ self.stop_requested = True
+
+ # Use the imported translation_stop_flag function from TransateKRtoEN
+ # This was imported during lazy loading as: translation_stop_flag = TransateKRtoEN.set_stop_flag
+ if 'translation_stop_flag' in globals() and translation_stop_flag:
+ translation_stop_flag(True)
+
+ # Also try to call it directly on the module if imported
+ try:
+ import TransateKRtoEN
+ if hasattr(TransateKRtoEN, 'set_stop_flag'):
+ TransateKRtoEN.set_stop_flag(True)
+ except:
+ pass
+
+ try:
+ import unified_api_client
+ if hasattr(unified_api_client, 'set_stop_flag'):
+ unified_api_client.set_stop_flag(True)
+ # If there's a global client instance, stop it too
+ if hasattr(unified_api_client, 'global_stop_flag'):
+ unified_api_client.global_stop_flag = True
+
+ # Set the _cancelled flag on the UnifiedClient class itself
+ if hasattr(unified_api_client, 'UnifiedClient'):
+ unified_api_client.UnifiedClient._global_cancelled = True
+
+ except Exception as e:
+ print(f"Error setting stop flags: {e}")
+
+ # Save and encrypt config when stopping
+ try:
+ self.save_config(show_message=False)
+ except:
+ pass
+
+ self.append_log("❌ Translation stop requested.")
+ self.append_log("⏳ Please wait... stopping after current operation completes.")
+ self.update_run_button()
+
+ if current_file and hasattr(self, 'entry_epub'):
+ self.master.after(100, lambda: self.preserve_file_path(current_file))
+
+ def preserve_file_path(self, file_path):
+ """Helper to ensure file path stays in the entry field"""
+ if hasattr(self, 'entry_epub') and file_path:
+ current = self.entry_epub.get()
+ if not current or current != file_path:
+ self.entry_epub.delete(0, tk.END)
+ self.entry_epub.insert(0, file_path)
+
+ def stop_glossary_extraction(self):
+ """Stop glossary extraction specifically"""
+ self.stop_requested = True
+ if glossary_stop_flag:
+ glossary_stop_flag(True)
+
+ try:
+ import extract_glossary_from_epub
+ if hasattr(extract_glossary_from_epub, 'set_stop_flag'):
+ extract_glossary_from_epub.set_stop_flag(True)
+ except: pass
+
+ # Important: Reset the thread/future references so button updates properly
+ if hasattr(self, 'glossary_thread'):
+ self.glossary_thread = None
+ if hasattr(self, 'glossary_future'):
+ self.glossary_future = None
+
+ self.append_log("❌ Glossary extraction stop requested.")
+ self.append_log("⏳ Please wait... stopping after current API call completes.")
+ self.update_run_button()
+
+
+ def stop_epub_converter(self):
+ """Stop EPUB converter"""
+ self.stop_requested = True
+ self.append_log("❌ EPUB converter stop requested.")
+ self.append_log("⏳ Please wait... stopping after current operation completes.")
+
+ # Important: Reset the thread reference so button updates properly
+ if hasattr(self, 'epub_thread'):
+ self.epub_thread = None
+
+ self.update_run_button()
+
+ def stop_qa_scan(self):
+ self.stop_requested = True
+ try:
+ from scan_html_folder import stop_scan
+ if stop_scan():
+ self.append_log("✅ Stop scan signal sent successfully")
+ except Exception as e:
+ self.append_log(f"❌ Failed to stop scan: {e}")
+ self.append_log("⛔ QA scan stop requested.")
+
+
+ def on_close(self):
+ if messagebox.askokcancel("Quit", "Are you sure you want to exit?"):
+ self.stop_requested = True
+
+ # Save and encrypt config before closing
+ try:
+ self.save_config(show_message=False)
+ except:
+ pass # Don't prevent closing if save fails
+
+ # Shutdown the executor to stop accepting new tasks
+ try:
+ if getattr(self, 'executor', None):
+ self.executor.shutdown(wait=False)
+ except Exception:
+ pass
+
+ self.master.destroy()
+ sys.exit(0)
+
+ def append_log(self, message):
+ """Append message to log with safety checks (fallback to print if GUI is gone)."""
+ def _append():
+ try:
+ # Bail out if the widget no longer exists
+ if not hasattr(self, 'log_text'):
+ print(message)
+ return
+ try:
+ exists = bool(self.log_text.winfo_exists())
+ except Exception:
+ exists = False
+ if not exists:
+ print(message)
+ return
+
+ at_bottom = False
+ try:
+ at_bottom = self.log_text.yview()[1] >= 0.98
+ except Exception:
+ at_bottom = False
+
+ is_memory = any(keyword in message for keyword in ['[MEMORY]', '📝', 'rolling summary', 'memory'])
+
+ if is_memory:
+ self.log_text.insert(tk.END, message + "\n", "memory")
+ if "memory" not in self.log_text.tag_names():
+ self.log_text.tag_config("memory", foreground="#4CAF50", font=('TkDefaultFont', 10, 'italic'))
+ else:
+ self.log_text.insert(tk.END, message + "\n")
+
+ if at_bottom:
+ self.log_text.see(tk.END)
+ except Exception:
+ # As a last resort, print to stdout to avoid crashing callbacks
+ try:
+ print(message)
+ except Exception:
+ pass
+
+ if threading.current_thread() is threading.main_thread():
+ _append()
+ else:
+ try:
+ self.master.after(0, _append)
+ except Exception:
+ # If the master window is gone, just print
+ try:
+ print(message)
+ except Exception:
+ pass
+
+ def update_status_line(self, message, progress_percent=None):
+ """Update a status line in the log safely (fallback to print)."""
+ def _update():
+ try:
+ if not hasattr(self, 'log_text') or not self.log_text.winfo_exists():
+ print(message)
+ return
+ content = self.log_text.get("1.0", "end-1c")
+ lines = content.split('\n')
+
+ status_markers = ['⏳', '📊', '✅', '❌', '🔄']
+ is_status_line = False
+
+ if lines and any(lines[-1].strip().startswith(marker) for marker in status_markers):
+ is_status_line = True
+
+ if progress_percent is not None:
+ bar_width = 10
+ filled = int(bar_width * progress_percent / 100)
+ bar = "▓" * filled + "░" * (bar_width - filled)
+ status_msg = f"⏳ {message} [{bar}] {progress_percent:.1f}%"
+ else:
+ status_msg = f"📊 {message}"
+
+ if is_status_line and lines[-1].strip().startswith(('⏳', '📊')):
+ start_pos = f"{len(lines)}.0"
+ self.log_text.delete(f"{start_pos} linestart", "end")
+ if len(lines) > 1:
+ self.log_text.insert("end", "\n" + status_msg)
+ else:
+ self.log_text.insert("end", status_msg)
+ else:
+ if content and not content.endswith('\n'):
+ self.log_text.insert("end", "\n" + status_msg)
+ else:
+ self.log_text.insert("end", status_msg + "\n")
+
+ self.log_text.see("end")
+ except Exception:
+ try:
+ print(message)
+ except Exception:
+ pass
+
+ if threading.current_thread() is threading.main_thread():
+ _update()
+ else:
+ try:
+ self.master.after(0, _update)
+ except Exception:
+ try:
+ print(message)
+ except Exception:
+ pass
+
+ def append_chunk_progress(self, chunk_num, total_chunks, chunk_type="text", chapter_info="",
+ overall_current=None, overall_total=None, extra_info=None):
+ """Append chunk progress with enhanced visual indicator"""
+ progress_bar_width = 20
+
+ overall_progress = 0
+ if overall_current is not None and overall_total is not None and overall_total > 0:
+ overall_progress = overall_current / overall_total
+
+ overall_filled = int(progress_bar_width * overall_progress)
+ overall_bar = "█" * overall_filled + "░" * (progress_bar_width - overall_filled)
+
+ if total_chunks == 1:
+ icon = "📄" if chunk_type == "text" else "🖼️"
+ msg_parts = [f"{icon} {chapter_info}"]
+
+ if extra_info:
+ msg_parts.append(f"[{extra_info}]")
+
+ if overall_current is not None and overall_total is not None:
+ msg_parts.append(f"\n Progress: [{overall_bar}] {overall_current}/{overall_total} ({overall_progress*100:.1f}%)")
+
+ if hasattr(self, '_chunk_start_times'):
+ if overall_current > 1:
+ elapsed = time.time() - self._translation_start_time
+ avg_time = elapsed / (overall_current - 1)
+ remaining = overall_total - overall_current + 1
+ eta_seconds = remaining * avg_time
+
+ if eta_seconds < 60:
+ eta_str = f"{int(eta_seconds)}s"
+ elif eta_seconds < 3600:
+ eta_str = f"{int(eta_seconds/60)}m {int(eta_seconds%60)}s"
+ else:
+ hours = int(eta_seconds / 3600)
+ minutes = int((eta_seconds % 3600) / 60)
+ eta_str = f"{hours}h {minutes}m"
+
+ msg_parts.append(f" - ETA: {eta_str}")
+ else:
+ self._translation_start_time = time.time()
+ self._chunk_start_times = {}
+
+ msg = " ".join(msg_parts)
+ else:
+ chunk_progress = chunk_num / total_chunks if total_chunks > 0 else 0
+ chunk_filled = int(progress_bar_width * chunk_progress)
+ chunk_bar = "█" * chunk_filled + "░" * (progress_bar_width - chunk_filled)
+
+ icon = "📄" if chunk_type == "text" else "🖼️"
+
+ msg_parts = [f"{icon} {chapter_info}"]
+ msg_parts.append(f"\n Chunk: [{chunk_bar}] {chunk_num}/{total_chunks} ({chunk_progress*100:.1f}%)")
+
+ if overall_current is not None and overall_total is not None:
+ msg_parts.append(f"\n Overall: [{overall_bar}] {overall_current}/{overall_total} ({overall_progress*100:.1f}%)")
+
+ msg = "".join(msg_parts)
+
+ if hasattr(self, '_chunk_start_times'):
+ self._chunk_start_times[f"{chapter_info}_{chunk_num}"] = time.time()
+
+ self.append_log(msg)
+
+ def _show_context_menu(self, event):
+ """Show context menu for log text"""
+ try:
+ context_menu = tk.Menu(self.master, tearoff=0)
+
+ try:
+ self.log_text.selection_get()
+ context_menu.add_command(label="Copy", command=self.copy_selection)
+ except tk.TclError:
+ context_menu.add_command(label="Copy", state="disabled")
+
+ context_menu.add_separator()
+ context_menu.add_command(label="Select All", command=self.select_all_log)
+
+ context_menu.tk_popup(event.x_root, event.y_root)
+ finally:
+ context_menu.grab_release()
+
+ def copy_selection(self):
+ """Copy selected text from log to clipboard"""
+ try:
+ text = self.log_text.selection_get()
+ self.master.clipboard_clear()
+ self.master.clipboard_append(text)
+ except tk.TclError:
+ pass
+
+ def select_all_log(self):
+ """Select all text in the log"""
+ self.log_text.tag_add(tk.SEL, "1.0", tk.END)
+ self.log_text.mark_set(tk.INSERT, "1.0")
+ self.log_text.see(tk.INSERT)
+
+ def auto_load_glossary_for_file(self, file_path):
+ """Automatically load glossary if it exists in the output folder"""
+
+ # CHECK FOR EPUB FIRST - before any clearing logic!
+ if not file_path or not os.path.isfile(file_path):
+ return
+
+ if not file_path.lower().endswith('.epub'):
+ return # Exit early for non-EPUB files - don't touch glossaries!
+
+ # Clear previous auto-loaded glossary if switching EPUB files
+ if file_path != self.auto_loaded_glossary_for_file:
+ # Only clear if the current glossary was auto-loaded AND not manually loaded
+ if (self.auto_loaded_glossary_path and
+ self.manual_glossary_path == self.auto_loaded_glossary_path and
+ not getattr(self, 'manual_glossary_manually_loaded', False)): # Check manual flag
+ self.manual_glossary_path = None
+ self.append_log("📑 Cleared auto-loaded glossary from previous novel")
+
+ self.auto_loaded_glossary_path = None
+ self.auto_loaded_glossary_for_file = None
+
+ # Don't override manually loaded glossaries
+ if getattr(self, 'manual_glossary_manually_loaded', False) and self.manual_glossary_path:
+ self.append_log(f"📑 Keeping manually loaded glossary: {os.path.basename(self.manual_glossary_path)}")
+ return
+
+ file_base = os.path.splitext(os.path.basename(file_path))[0]
+ output_dir = file_base
+
+ # Prefer CSV over JSON when both exist
+ glossary_candidates = [
+ os.path.join(output_dir, "glossary.csv"),
+ os.path.join(output_dir, f"{file_base}_glossary.csv"),
+ os.path.join(output_dir, "Glossary", f"{file_base}_glossary.csv"),
+ os.path.join(output_dir, "glossary.json"),
+ os.path.join(output_dir, f"{file_base}_glossary.json"),
+ os.path.join(output_dir, "Glossary", f"{file_base}_glossary.json")
+ ]
+ for glossary_path in glossary_candidates:
+ if os.path.exists(glossary_path):
+ ext = os.path.splitext(glossary_path)[1].lower()
+ try:
+ if ext == '.csv':
+ # Accept CSV without parsing
+ self.manual_glossary_path = glossary_path
+ self.auto_loaded_glossary_path = glossary_path
+ self.auto_loaded_glossary_for_file = file_path
+ self.manual_glossary_manually_loaded = False # This is auto-loaded
+ self.append_log(f"📑 Auto-loaded glossary (CSV) for {file_base}: {os.path.basename(glossary_path)}")
+ break
+ else:
+ with open(glossary_path, 'r', encoding='utf-8') as f:
+ glossary_data = json.load(f)
+ self.manual_glossary_path = glossary_path
+ self.auto_loaded_glossary_path = glossary_path
+ self.auto_loaded_glossary_for_file = file_path
+ self.manual_glossary_manually_loaded = False # This is auto-loaded
+ self.append_log(f"📑 Auto-loaded glossary (JSON) for {file_base}: {os.path.basename(glossary_path)}")
+ break
+ except Exception:
+ # If JSON parsing fails, try next candidate
+ continue
+ continue
+
+ return False
+
+ # File Selection Methods
+ def browse_files(self):
+ """Select one or more files - automatically handles single/multiple selection"""
+ paths = filedialog.askopenfilenames(
+ title="Select File(s) - Hold Ctrl/Shift to select multiple",
+ filetypes=[
+ ("Supported files", "*.epub;*.cbz;*.txt;*.json;*.png;*.jpg;*.jpeg;*.gif;*.bmp;*.webp"),
+ ("EPUB/CBZ", "*.epub;*.cbz"),
+ ("EPUB files", "*.epub"),
+ ("Comic Book Zip", "*.cbz"),
+ ("Text files", "*.txt;*.json"),
+ ("Image files", "*.png;*.jpg;*.jpeg;*.gif;*.bmp;*.webp"),
+ ("PNG files", "*.png"),
+ ("JPEG files", "*.jpg;*.jpeg"),
+ ("GIF files", "*.gif"),
+ ("BMP files", "*.bmp"),
+ ("WebP files", "*.webp"),
+ ("All files", "*.*")
+ ]
+ )
+ if paths:
+ self._handle_file_selection(list(paths))
+
+ def browse_folder(self):
+ """Select an entire folder of files"""
+ folder_path = filedialog.askdirectory(
+ title="Select Folder Containing Files to Translate"
+ )
+ if folder_path:
+ # Find all supported files in the folder
+ supported_extensions = {'.epub', '.cbz', '.txt', '.json', '.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp'}
+ files = []
+
+ # Recursively find files if deep scan is enabled
+ if hasattr(self, 'deep_scan_var') and self.deep_scan_var.get():
+ for root, dirs, filenames in os.walk(folder_path):
+ for filename in filenames:
+ file_path = os.path.join(root, filename)
+ if os.path.splitext(filename)[1].lower() in supported_extensions:
+ files.append(file_path)
+ else:
+ # Just scan the immediate folder
+ for filename in sorted(os.listdir(folder_path)):
+ file_path = os.path.join(folder_path, filename)
+ if os.path.isfile(file_path):
+ ext = os.path.splitext(filename)[1].lower()
+ if ext in supported_extensions:
+ files.append(file_path)
+
+ if files:
+ self._handle_file_selection(sorted(files))
+ self.append_log(f"📁 Found {len(files)} supported files in: {os.path.basename(folder_path)}")
+ else:
+ messagebox.showwarning("No Files Found",
+ f"No supported files found in:\n{folder_path}\n\nSupported formats: EPUB, TXT, PNG, JPG, JPEG, GIF, BMP, WebP")
+
+ def clear_file_selection(self):
+ """Clear all selected files"""
+ self.entry_epub.delete(0, tk.END)
+ self.entry_epub.insert(0, "No file selected")
+ self.selected_files = []
+ self.file_path = None
+ self.current_file_index = 0
+
+ # Clear EPUB tracking
+ if hasattr(self, 'selected_epub_path'):
+ self.selected_epub_path = None
+ if hasattr(self, 'selected_epub_files'):
+ self.selected_epub_files = []
+
+ # Persist clear state
+ try:
+ self.config['last_input_files'] = []
+ self.config['last_epub_path'] = None
+ self.save_config(show_message=False)
+ except Exception:
+ pass
+ self.append_log("🗑️ Cleared file selection")
+
+
+ def _handle_file_selection(self, paths):
+ """Common handler for file selection"""
+ if not paths:
+ return
+
+ # Initialize JSON conversion tracking if not exists
+ if not hasattr(self, 'json_conversions'):
+ self.json_conversions = {} # Maps converted .txt paths to original .json paths
+
+ # Process JSON files first - convert them to TXT
+ processed_paths = []
+
+ for path in paths:
+ lower = path.lower()
+ if lower.endswith('.json'):
+ # Convert JSON to TXT
+ txt_path = self._convert_json_to_txt(path)
+ if txt_path:
+ processed_paths.append(txt_path)
+ # Track the conversion for potential reverse conversion later
+ self.json_conversions[txt_path] = path
+ self.append_log(f"📄 Converted JSON to TXT: {os.path.basename(path)}")
+ else:
+ self.append_log(f"❌ Failed to convert JSON: {os.path.basename(path)}")
+ elif lower.endswith('.cbz'):
+ # Extract images from CBZ (ZIP) to a temp folder and add them
+ try:
+ import zipfile, tempfile, shutil
+ temp_root = getattr(self, 'cbz_temp_root', None)
+ if not temp_root:
+ temp_root = tempfile.mkdtemp(prefix='glossarion_cbz_')
+ self.cbz_temp_root = temp_root
+ base = os.path.splitext(os.path.basename(path))[0]
+ extract_dir = os.path.join(temp_root, base)
+ os.makedirs(extract_dir, exist_ok=True)
+ with zipfile.ZipFile(path, 'r') as zf:
+ members = [m for m in zf.namelist() if m.lower().endswith(('.png', '.jpg', '.jpeg', '.webp', '.bmp', '.gif'))]
+ # Preserve order by natural sort
+ members.sort()
+ for m in members:
+ target_path = os.path.join(extract_dir, os.path.basename(m))
+ if not os.path.exists(target_path):
+ with zf.open(m) as src, open(target_path, 'wb') as dst:
+ shutil.copyfileobj(src, dst)
+ processed_paths.append(target_path)
+ self.append_log(f"📦 Extracted {len([p for p in processed_paths if p.startswith(extract_dir)])} images from {os.path.basename(path)}")
+ except Exception as e:
+ self.append_log(f"❌ Failed to read CBZ: {os.path.basename(path)} - {e}")
+ else:
+ # Non-JSON/CBZ files pass through unchanged
+ processed_paths.append(path)
+
+ # Store the list of selected files (using processed paths)
+ self.selected_files = processed_paths
+ self.current_file_index = 0
+
+ # Persist last selection to config for next session
+ try:
+ self.config['last_input_files'] = processed_paths
+ self.save_config(show_message=False)
+ except Exception:
+ pass
+
+ # Update the entry field
+ self.entry_epub.delete(0, tk.END)
+
+ # Define image extensions
+ image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp'}
+
+ if len(processed_paths) == 1:
+ # Single file - display full path
+ # Check if this was a JSON conversion
+ if processed_paths[0] in self.json_conversions:
+ # Show original JSON filename in parentheses
+ original_json = self.json_conversions[processed_paths[0]]
+ display_path = f"{processed_paths[0]} (from {os.path.basename(original_json)})"
+ self.entry_epub.insert(0, display_path)
+ else:
+ self.entry_epub.insert(0, processed_paths[0])
+ self.file_path = processed_paths[0] # For backward compatibility
+ else:
+ # Multiple files - display count and summary
+ # Group by type (count original types, not processed)
+ images = [p for p in processed_paths if os.path.splitext(p)[1].lower() in image_extensions]
+ epubs = [p for p in processed_paths if p.lower().endswith('.epub')]
+ txts = [p for p in processed_paths if p.lower().endswith('.txt') and p not in self.json_conversions]
+ jsons = [p for p in self.json_conversions.values()] # Count original JSON files
+ converted_txts = [p for p in processed_paths if p in self.json_conversions]
+
+ summary_parts = []
+ if epubs:
+ summary_parts.append(f"{len(epubs)} EPUB")
+ if txts:
+ summary_parts.append(f"{len(txts)} TXT")
+ if jsons:
+ summary_parts.append(f"{len(jsons)} JSON")
+ if images:
+ summary_parts.append(f"{len(images)} images")
+
+ display_text = f"{len(paths)} files selected ({', '.join(summary_parts)})"
+ self.entry_epub.insert(0, display_text)
+ self.file_path = processed_paths[0] # Set first file as primary
+
+ # Check if these are image files
+ image_files = [p for p in processed_paths if os.path.splitext(p)[1].lower() in image_extensions]
+
+ if image_files:
+ # Enable image translation if not already enabled
+ if hasattr(self, 'enable_image_translation_var') and not self.enable_image_translation_var.get():
+ self.enable_image_translation_var.set(True)
+ self.append_log(f"🖼️ Detected {len(image_files)} image file(s) - automatically enabled image translation")
+
+ # Clear glossary for image files
+ if hasattr(self, 'auto_loaded_glossary_path'):
+ #self.manual_glossary_path = None
+ self.auto_loaded_glossary_path = None
+ self.auto_loaded_glossary_for_file = None
+ self.append_log("📑 Cleared glossary settings (image files selected)")
+ else:
+ # Handle EPUB/TXT files
+ epub_files = [p for p in processed_paths if p.lower().endswith('.epub')]
+
+ if len(epub_files) == 1:
+ # Single EPUB - auto-load glossary
+ self.auto_load_glossary_for_file(epub_files[0])
+ # Persist EPUB path for QA defaults
+ try:
+ self.selected_epub_path = epub_files[0]
+ self.selected_epub_files = [epub_files[0]] # Track single EPUB in list format
+ self.config['last_epub_path'] = epub_files[0]
+ os.environ['EPUB_PATH'] = epub_files[0]
+ self.save_config(show_message=False)
+ except Exception:
+ pass
+ elif len(epub_files) > 1:
+ # Multiple EPUBs - clear glossary but update EPUB path tracking
+ if hasattr(self, 'auto_loaded_glossary_path'):
+ self.manual_glossary_path = None
+ self.auto_loaded_glossary_path = None
+ self.auto_loaded_glossary_for_file = None
+ self.append_log("📁 Multiple files selected - glossary auto-loading disabled")
+
+ # For multiple EPUBs, set the selected_epub_path to the first one
+ # but track all EPUBs for word count analysis
+ try:
+ self.selected_epub_path = epub_files[0] # Use first EPUB as primary
+ self.selected_epub_files = epub_files # Track all EPUBs
+ self.config['last_epub_path'] = epub_files[0]
+ os.environ['EPUB_PATH'] = epub_files[0]
+ self.save_config(show_message=False)
+
+ # Log that multiple EPUBs are selected
+ self.append_log(f"📖 {len(epub_files)} EPUB files selected - using '{os.path.basename(epub_files[0])}' as primary for word count analysis")
+ except Exception:
+ pass
+
+ def _convert_json_to_txt(self, json_path):
+ """Convert a JSON file to TXT format for translation."""
+ try:
+ # Read JSON file
+ with open(json_path, 'r', encoding='utf-8') as f:
+ content = f.read()
+
+ # Parse JSON
+ try:
+ data = json.loads(content)
+ except json.JSONDecodeError as e:
+ self.append_log(f"⚠️ JSON parsing error: {str(e)}")
+ self.append_log("🔧 Attempting to fix JSON...")
+ fixed_content = self._comprehensive_json_fix(content)
+ data = json.loads(fixed_content)
+ self.append_log("✅ JSON fixed successfully")
+
+ # Create output file
+ base_dir = os.path.dirname(json_path)
+ base_name = os.path.splitext(os.path.basename(json_path))[0]
+ txt_path = os.path.join(base_dir, f"{base_name}_json_temp.txt")
+
+ # CHECK IF THIS IS A GLOSSARY - PUT EVERYTHING IN ONE CHAPTER
+ filename_lower = os.path.basename(json_path).lower()
+ is_glossary = any(term in filename_lower for term in ['glossary', 'dictionary', 'terms', 'characters', 'names'])
+
+ # Also check structure
+ if not is_glossary and isinstance(data, dict):
+ # If it's a flat dictionary with many short entries, it's probably a glossary
+ if len(data) > 20: # More than 20 entries
+ values = list(data.values())[:10] # Check first 10
+ if all(isinstance(v, str) and len(v) < 500 for v in values):
+ is_glossary = True
+ self.append_log("📚 Detected glossary structure (many short entries)")
+ self.append_log(f"🔍 Found {len(data)} dictionary entries with avg length < 500 chars")
+
+ with open(txt_path, 'w', encoding='utf-8') as f:
+ # Add metadata header
+ f.write(f"[JSON_SOURCE: {os.path.basename(json_path)}]\n")
+ f.write(f"[JSON_STRUCTURE_TYPE: {type(data).__name__}]\n")
+ f.write(f"[JSON_CONVERSION_VERSION: 1.0]\n")
+ if is_glossary:
+ f.write("[GLOSSARY_MODE: SINGLE_CHUNK]\n")
+ f.write("\n")
+
+ if is_glossary:
+ # PUT ENTIRE GLOSSARY IN ONE CHAPTER
+ self.append_log(f"📚 Glossary mode: Creating single chapter for {len(data)} entries")
+ self.append_log("🚫 CHUNK SPLITTING DISABLED for glossary file")
+ self.append_log(f"📝 All {len(data)} entries will be processed in ONE API call")
+ f.write("=== Chapter 1: Full Glossary ===\n\n")
+
+ if isinstance(data, dict):
+ for key, value in data.items():
+ f.write(f"{key}: {value}\n\n")
+ elif isinstance(data, list):
+ for item in data:
+ if isinstance(item, str):
+ f.write(f"{item}\n\n")
+ else:
+ f.write(f"{json.dumps(item, ensure_ascii=False, indent=2)}\n\n")
+ else:
+ f.write(json.dumps(data, ensure_ascii=False, indent=2))
+
+ else:
+ # NORMAL PROCESSING - SEPARATE CHAPTERS
+ if isinstance(data, dict):
+ for idx, (key, value) in enumerate(data.items(), 1):
+ f.write(f"\n=== Chapter {idx}: {key} ===\n\n")
+
+ if isinstance(value, str):
+ f.write(value)
+ elif isinstance(value, list) and all(isinstance(item, str) for item in value):
+ for item in value:
+ f.write(f"{item}\n\n")
+ else:
+ f.write(json.dumps(value, ensure_ascii=False, indent=2))
+
+ f.write("\n\n")
+
+ elif isinstance(data, list):
+ for idx, item in enumerate(data, 1):
+ f.write(f"\n=== Chapter {idx} ===\n\n")
+
+ if isinstance(item, str):
+ f.write(item)
+ else:
+ f.write(json.dumps(item, ensure_ascii=False, indent=2))
+
+ f.write("\n\n")
+
+ else:
+ f.write("=== Content ===\n\n")
+ if isinstance(data, str):
+ f.write(data)
+ else:
+ f.write(json.dumps(data, ensure_ascii=False, indent=2))
+
+ return txt_path
+
+ except Exception as e:
+ self.append_log(f"❌ Error converting JSON: {str(e)}")
+ import traceback
+ self.append_log(f"Debug: {traceback.format_exc()}")
+ return None
+
+ def convert_translated_to_json(self, translated_txt_path):
+ """Convert translated TXT back to JSON format if it was originally JSON."""
+
+ # Check if this was a JSON conversion
+ original_json_path = None
+ for txt_path, json_path in self.json_conversions.items():
+ # Check if this is the translated version of a converted file
+ if translated_txt_path.replace("_translated", "_json_temp") == txt_path:
+ original_json_path = json_path
+ break
+ # Also check direct match
+ if txt_path.replace("_json_temp", "_translated") == translated_txt_path:
+ original_json_path = json_path
+ break
+
+ if not original_json_path:
+ return None
+
+ try:
+ # Read original JSON structure
+ with open(original_json_path, 'r', encoding='utf-8') as f:
+ original_data = json.load(f)
+
+ # Read translated content
+ with open(translated_txt_path, 'r', encoding='utf-8') as f:
+ translated_content = f.read()
+
+ # Remove metadata headers
+ lines = translated_content.split('\n')
+ content_start = 0
+ for i, line in enumerate(lines):
+ if line.strip() and not line.startswith('[JSON_'):
+ content_start = i
+ break
+ translated_content = '\n'.join(lines[content_start:])
+
+ # Parse chapters from translated content
+ import re
+ chapter_pattern = r'=== Chapter \d+(?:: ([^=]+))? ==='
+ chapters = re.split(chapter_pattern, translated_content)
+
+ # Clean up chapters
+ cleaned_chapters = []
+ for i, chapter in enumerate(chapters):
+ if chapter and chapter.strip() and not chapter.startswith('==='):
+ cleaned_chapters.append(chapter.strip())
+
+ # Rebuild JSON structure with translated content
+ if isinstance(original_data, dict):
+ result = {}
+ keys = list(original_data.keys())
+
+ # Match chapters to original keys
+ for i, key in enumerate(keys):
+ if i < len(cleaned_chapters):
+ result[key] = cleaned_chapters[i]
+ else:
+ # Preserve original if no translation found
+ result[key] = original_data[key]
+
+ elif isinstance(original_data, list):
+ result = []
+
+ for i, item in enumerate(original_data):
+ if i < len(cleaned_chapters):
+ if isinstance(item, dict) and 'content' in item:
+ # Preserve structure for dictionary items
+ new_item = item.copy()
+ new_item['content'] = cleaned_chapters[i]
+ result.append(new_item)
+ else:
+ # Direct replacement
+ result.append(cleaned_chapters[i])
+ else:
+ # Preserve original if no translation found
+ result.append(item)
+
+ else:
+ # Single value
+ result = cleaned_chapters[0] if cleaned_chapters else original_data
+
+ # Save as JSON
+ output_json_path = translated_txt_path.replace('.txt', '.json')
+ with open(output_json_path, 'w', encoding='utf-8') as f:
+ json.dump(result, f, ensure_ascii=False, indent=2)
+
+ self.append_log(f"✅ Converted back to JSON: {os.path.basename(output_json_path)}")
+ return output_json_path
+
+ except Exception as e:
+ self.append_log(f"❌ Error converting back to JSON: {str(e)}")
+ import traceback
+ self.append_log(f"Debug: {traceback.format_exc()}")
+ return None
+
+ def toggle_api_visibility(self):
+ show = self.api_key_entry.cget('show')
+ self.api_key_entry.config(show='' if show == '*' else '*')
+ # Track the visibility state
+ self.api_key_visible = (show == '*') # Will be True when showing, False when hiding
+
+ def configure_translation_chunk_prompt(self):
+ """Configure the prompt template for translation chunks"""
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Configure Translation Chunk Prompt",
+ width=700,
+ height=None
+ )
+
+ main_frame = tk.Frame(dialog, padx=20, pady=20)
+ main_frame.pack(fill=tk.BOTH, expand=True)
+
+ tk.Label(main_frame, text="Translation Chunk Prompt Template",
+ font=('TkDefaultFont', 14, 'bold')).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(main_frame, text="Configure how chunks are presented to the AI when chapters are split.",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(0, 10))
+
+ # Instructions
+ instructions_frame = tk.LabelFrame(main_frame, text="Available Placeholders", padx=10, pady=10)
+ instructions_frame.pack(fill=tk.X, pady=(0, 15))
+
+ placeholders = [
+ ("{chunk_idx}", "Current chunk number (1-based)"),
+ ("{total_chunks}", "Total number of chunks"),
+ ("{chunk_html}", "The actual HTML content to translate")
+ ]
+
+ for placeholder, desc in placeholders:
+ placeholder_frame = tk.Frame(instructions_frame)
+ placeholder_frame.pack(anchor=tk.W, pady=2)
+ tk.Label(placeholder_frame, text=f"• {placeholder}:", font=('Courier', 10, 'bold')).pack(side=tk.LEFT)
+ tk.Label(placeholder_frame, text=f" {desc}", font=('TkDefaultFont', 10)).pack(side=tk.LEFT)
+
+ # Prompt input
+ prompt_frame = tk.LabelFrame(main_frame, text="Chunk Prompt Template", padx=10, pady=10)
+ prompt_frame.pack(fill=tk.BOTH, expand=True, pady=(0, 10))
+
+ self.chunk_prompt_text = self.ui.setup_scrollable_text(
+ prompt_frame, height=8, wrap=tk.WORD
+ )
+ self.chunk_prompt_text.pack(fill=tk.BOTH, expand=True)
+ self.chunk_prompt_text.insert('1.0', self.translation_chunk_prompt)
+
+ # Example
+ example_frame = tk.LabelFrame(main_frame, text="Example Output", padx=10, pady=10)
+ example_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(example_frame, text="With chunk 2 of 5, the prompt would be:",
+ font=('TkDefaultFont', 10)).pack(anchor=tk.W)
+
+ self.example_label = tk.Label(example_frame, text="",
+ font=('Courier', 9), fg='blue',
+ wraplength=650, justify=tk.LEFT)
+ self.example_label.pack(anchor=tk.W, pady=(5, 0))
+
+ def update_example(*args):
+ try:
+ template = self.chunk_prompt_text.get('1.0', tk.END).strip()
+ example = template.replace('{chunk_idx}', '2').replace('{total_chunks}', '5').replace('{chunk_html}', '
Chapter content here...
')
+ self.example_label.config(text=example[:200] + "..." if len(example) > 200 else example)
+ except:
+ self.example_label.config(text="[Invalid template]")
+
+ self.chunk_prompt_text.bind('', update_example)
+ update_example()
+
+ # Buttons
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(10, 0))
+
+ def save_chunk_prompt():
+ self.translation_chunk_prompt = self.chunk_prompt_text.get('1.0', tk.END).strip()
+ self.config['translation_chunk_prompt'] = self.translation_chunk_prompt
+ messagebox.showinfo("Success", "Translation chunk prompt saved!")
+ dialog.destroy()
+
+ def reset_chunk_prompt():
+ if messagebox.askyesno("Reset Prompt", "Reset to default chunk prompt?"):
+ self.chunk_prompt_text.delete('1.0', tk.END)
+ self.chunk_prompt_text.insert('1.0', self.default_translation_chunk_prompt)
+ update_example()
+
+ tb.Button(button_frame, text="Save", command=save_chunk_prompt,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Reset to Default", command=reset_chunk_prompt,
+ bootstyle="warning", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ dialog.deiconify()
+
+ def configure_image_chunk_prompt(self):
+ """Configure the prompt template for image chunks"""
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Configure Image Chunk Prompt",
+ width=700,
+ height=None
+ )
+
+ main_frame = tk.Frame(dialog, padx=20, pady=20)
+ main_frame.pack(fill=tk.BOTH, expand=True)
+
+ tk.Label(main_frame, text="Image Chunk Context Template",
+ font=('TkDefaultFont', 14, 'bold')).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(main_frame, text="Configure the context provided when tall images are split into chunks.",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(0, 10))
+
+ # Instructions
+ instructions_frame = tk.LabelFrame(main_frame, text="Available Placeholders", padx=10, pady=10)
+ instructions_frame.pack(fill=tk.X, pady=(0, 15))
+
+ placeholders = [
+ ("{chunk_idx}", "Current chunk number (1-based)"),
+ ("{total_chunks}", "Total number of chunks"),
+ ("{context}", "Additional context (e.g., chapter info)")
+ ]
+
+ for placeholder, desc in placeholders:
+ placeholder_frame = tk.Frame(instructions_frame)
+ placeholder_frame.pack(anchor=tk.W, pady=2)
+ tk.Label(placeholder_frame, text=f"• {placeholder}:", font=('Courier', 10, 'bold')).pack(side=tk.LEFT)
+ tk.Label(placeholder_frame, text=f" {desc}", font=('TkDefaultFont', 10)).pack(side=tk.LEFT)
+
+ # Prompt input
+ prompt_frame = tk.LabelFrame(main_frame, text="Image Chunk Prompt Template", padx=10, pady=10)
+ prompt_frame.pack(fill=tk.BOTH, expand=True, pady=(0, 10))
+
+ self.image_chunk_prompt_text = self.ui.setup_scrollable_text(
+ prompt_frame, height=8, wrap=tk.WORD
+ )
+ self.image_chunk_prompt_text.pack(fill=tk.BOTH, expand=True)
+ self.image_chunk_prompt_text.insert('1.0', self.image_chunk_prompt)
+
+ # Example
+ example_frame = tk.LabelFrame(main_frame, text="Example Output", padx=10, pady=10)
+ example_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(example_frame, text="With chunk 3 of 7 and chapter context, the prompt would be:",
+ font=('TkDefaultFont', 10)).pack(anchor=tk.W)
+
+ self.image_example_label = tk.Label(example_frame, text="",
+ font=('Courier', 9), fg='blue',
+ wraplength=650, justify=tk.LEFT)
+ self.image_example_label.pack(anchor=tk.W, pady=(5, 0))
+
+
+ def update_image_example(*args):
+ try:
+ template = self.image_chunk_prompt_text.get('1.0', tk.END).strip()
+ example = template.replace('{chunk_idx}', '3').replace('{total_chunks}', '7').replace('{context}', 'Chapter 5: The Great Battle')
+ self.image_example_label.config(text=example)
+ except:
+ self.image_example_label.config(text="[Invalid template]")
+
+ self.image_chunk_prompt_text.bind('', update_image_example)
+ update_image_example()
+
+ # Buttons
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(10, 0))
+
+ def save_image_chunk_prompt():
+ self.image_chunk_prompt = self.image_chunk_prompt_text.get('1.0', tk.END).strip()
+ self.config['image_chunk_prompt'] = self.image_chunk_prompt
+ messagebox.showinfo("Success", "Image chunk prompt saved!")
+ dialog.destroy()
+
+ def reset_image_chunk_prompt():
+ if messagebox.askyesno("Reset Prompt", "Reset to default image chunk prompt?"):
+ self.image_chunk_prompt_text.delete('1.0', tk.END)
+ self.image_chunk_prompt_text.insert('1.0', self.default_image_chunk_prompt)
+ update_image_example()
+
+ tb.Button(button_frame, text="Save", command=save_image_chunk_prompt,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Reset to Default", command=reset_image_chunk_prompt,
+ bootstyle="warning", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ dialog.deiconify()
+
+ def configure_image_compression(self):
+ """Open the image compression configuration dialog"""
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Image Compression Settings",
+ width=None,
+ height=None,
+ max_width_ratio=0.6,
+ max_height_ratio=1.2
+ )
+
+ # Main container with padding
+ main_frame = tk.Frame(scrollable_frame)
+ main_frame.pack(fill=tk.BOTH, expand=True, padx=20, pady=20)
+
+ # Title
+ title_label = tk.Label(main_frame, text="🗜️ Image Compression Settings",
+ font=('TkDefaultFont', 14, 'bold'))
+ title_label.pack(anchor=tk.W, pady=(0, 15))
+
+ # Enable compression toggle
+ enable_frame = tk.Frame(main_frame)
+ enable_frame.pack(fill=tk.X, pady=(0, 20))
+
+ self.enable_image_compression_var = tk.BooleanVar(
+ value=self.config.get('enable_image_compression', False)
+ )
+ tb.Checkbutton(enable_frame, text="Enable Image Compression",
+ variable=self.enable_image_compression_var,
+ bootstyle="round-toggle",
+ command=lambda: self._toggle_compression_options()).pack(anchor=tk.W)
+
+ # Create container for all compression options
+ self.compression_options_frame = tk.Frame(main_frame)
+ self.compression_options_frame.pack(fill=tk.BOTH, expand=True)
+
+ # Auto Compression Section
+ auto_section = tk.LabelFrame(self.compression_options_frame, text="Automatic Compression",
+ padx=15, pady=10)
+ auto_section.pack(fill=tk.X, pady=(0, 15))
+
+ self.auto_compress_enabled_var = tk.BooleanVar(
+ value=self.config.get('auto_compress_enabled', True)
+ )
+ tb.Checkbutton(auto_section, text="Auto-compress to fit token limits",
+ variable=self.auto_compress_enabled_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ # Token limit setting
+ token_frame = tk.Frame(auto_section)
+ token_frame.pack(fill=tk.X, pady=(10, 0))
+
+ tk.Label(token_frame, text="Target tokens per image:").pack(side=tk.LEFT)
+
+ self.target_image_tokens_var = tk.StringVar(
+ value=str(self.config.get('target_image_tokens', '1000'))
+ )
+ tb.Entry(token_frame, width=10, textvariable=self.target_image_tokens_var).pack(side=tk.LEFT, padx=(10, 0))
+
+ tk.Label(token_frame, text="(Gemini uses ~258 tokens per image)",
+ font=('TkDefaultFont', 9), fg='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Format Selection Section
+ format_section = tk.LabelFrame(self.compression_options_frame, text="Output Format",
+ padx=15, pady=10)
+ format_section.pack(fill=tk.X, pady=(0, 15))
+
+ self.image_format_var = tk.StringVar(
+ value=self.config.get('image_compression_format', 'auto')
+ )
+
+ formats = [
+ ("Auto (Best quality/size ratio)", "auto"),
+ ("WebP (Best compression)", "webp"),
+ ("JPEG (Wide compatibility)", "jpeg"),
+ ("PNG (Lossless)", "png")
+ ]
+
+ for text, value in formats:
+ tb.Radiobutton(format_section, text=text, variable=self.image_format_var,
+ value=value).pack(anchor=tk.W, pady=2)
+
+ # Quality Settings Section
+ quality_section = tk.LabelFrame(self.compression_options_frame, text="Quality Settings",
+ padx=15, pady=10)
+ quality_section.pack(fill=tk.X, pady=(0, 15))
+
+ # WebP Quality
+ webp_frame = tk.Frame(quality_section)
+ webp_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(webp_frame, text="WebP Quality:", width=15, anchor=tk.W).pack(side=tk.LEFT)
+
+ self.webp_quality_var = tk.IntVar(value=self.config.get('webp_quality', 85))
+ webp_scale = tk.Scale(webp_frame, from_=1, to=100, orient=tk.HORIZONTAL,
+ variable=self.webp_quality_var, length=200)
+ webp_scale.pack(side=tk.LEFT, padx=(10, 10))
+
+ self.webp_quality_label = tk.Label(webp_frame, text=f"{self.webp_quality_var.get()}%")
+ self.webp_quality_label.pack(side=tk.LEFT)
+
+ webp_scale.config(command=lambda v: self.webp_quality_label.config(text=f"{int(float(v))}%"))
+
+ # JPEG Quality
+ jpeg_frame = tk.Frame(quality_section)
+ jpeg_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(jpeg_frame, text="JPEG Quality:", width=15, anchor=tk.W).pack(side=tk.LEFT)
+
+ self.jpeg_quality_var = tk.IntVar(value=self.config.get('jpeg_quality', 85))
+ jpeg_scale = tk.Scale(jpeg_frame, from_=1, to=100, orient=tk.HORIZONTAL,
+ variable=self.jpeg_quality_var, length=200)
+ jpeg_scale.pack(side=tk.LEFT, padx=(10, 10))
+
+ self.jpeg_quality_label = tk.Label(jpeg_frame, text=f"{self.jpeg_quality_var.get()}%")
+ self.jpeg_quality_label.pack(side=tk.LEFT)
+
+ jpeg_scale.config(command=lambda v: self.jpeg_quality_label.config(text=f"{int(float(v))}%"))
+
+ # PNG Compression
+ png_frame = tk.Frame(quality_section)
+ png_frame.pack(fill=tk.X)
+
+ tk.Label(png_frame, text="PNG Compression:", width=15, anchor=tk.W).pack(side=tk.LEFT)
+
+ self.png_compression_var = tk.IntVar(value=self.config.get('png_compression', 6))
+ png_scale = tk.Scale(png_frame, from_=0, to=9, orient=tk.HORIZONTAL,
+ variable=self.png_compression_var, length=200)
+ png_scale.pack(side=tk.LEFT, padx=(10, 10))
+
+ self.png_compression_label = tk.Label(png_frame, text=f"Level {self.png_compression_var.get()}")
+ self.png_compression_label.pack(side=tk.LEFT)
+
+ png_scale.config(command=lambda v: self.png_compression_label.config(text=f"Level {int(float(v))}"))
+
+ # Resolution Limits Section
+ resolution_section = tk.LabelFrame(self.compression_options_frame, text="Resolution Limits",
+ padx=15, pady=10)
+ resolution_section.pack(fill=tk.X, pady=(0, 15))
+
+ # Max dimension
+ max_dim_frame = tk.Frame(resolution_section)
+ max_dim_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(max_dim_frame, text="Max dimension (px):").pack(side=tk.LEFT)
+
+ self.max_image_dimension_var = tk.StringVar(
+ value=str(self.config.get('max_image_dimension', '2048'))
+ )
+ tb.Entry(max_dim_frame, width=10, textvariable=self.max_image_dimension_var).pack(side=tk.LEFT, padx=(10, 0))
+
+ tk.Label(max_dim_frame, text="(Images larger than this will be resized)",
+ font=('TkDefaultFont', 9), fg='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Max file size
+ max_size_frame = tk.Frame(resolution_section)
+ max_size_frame.pack(fill=tk.X)
+
+ tk.Label(max_size_frame, text="Max file size (MB):").pack(side=tk.LEFT)
+
+ self.max_image_size_mb_var = tk.StringVar(
+ value=str(self.config.get('max_image_size_mb', '10'))
+ )
+ tb.Entry(max_size_frame, width=10, textvariable=self.max_image_size_mb_var).pack(side=tk.LEFT, padx=(10, 0))
+
+ tk.Label(max_size_frame, text="(Larger files will be compressed)",
+ font=('TkDefaultFont', 9), fg='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Advanced Options Section
+ advanced_section = tk.LabelFrame(self.compression_options_frame, text="Advanced Options",
+ padx=15, pady=10)
+ advanced_section.pack(fill=tk.X, pady=(0, 15))
+
+ self.preserve_transparency_var = tk.BooleanVar(
+ value=self.config.get('preserve_transparency', False) # Changed default to False
+ )
+ tb.Checkbutton(advanced_section, text="Preserve transparency (PNG/WebP only)",
+ variable=self.preserve_transparency_var).pack(anchor=tk.W, pady=2)
+
+ self.preserve_original_format_var = tk.BooleanVar(
+ value=self.config.get('preserve_original_format', False)
+ )
+ tb.Checkbutton(advanced_section, text="Preserve original image format",
+ variable=self.preserve_original_format_var).pack(anchor=tk.W, pady=2)
+
+ self.optimize_for_ocr_var = tk.BooleanVar(
+ value=self.config.get('optimize_for_ocr', True)
+ )
+ tb.Checkbutton(advanced_section, text="Optimize for OCR (maintain text clarity)",
+ variable=self.optimize_for_ocr_var).pack(anchor=tk.W, pady=2)
+
+ self.progressive_encoding_var = tk.BooleanVar(
+ value=self.config.get('progressive_encoding', True)
+ )
+ tb.Checkbutton(advanced_section, text="Progressive encoding (JPEG)",
+ variable=self.progressive_encoding_var).pack(anchor=tk.W, pady=2)
+
+ self.save_compressed_images_var = tk.BooleanVar(
+ value=self.config.get('save_compressed_images', False)
+ )
+ tb.Checkbutton(advanced_section, text="Save compressed images to disk",
+ variable=self.save_compressed_images_var).pack(anchor=tk.W, pady=2)
+
+ # Info Section
+ info_frame = tk.Frame(self.compression_options_frame)
+ info_frame.pack(fill=tk.X)
+
+ info_text = ("💡 Tips:\n"
+ "• WebP offers the best compression with good quality\n"
+ "• Use 'Auto' format for intelligent format selection\n"
+ "• Higher quality = larger file size\n"
+ "• OCR optimization maintains text readability")
+
+ tk.Label(info_frame, text=info_text, justify=tk.LEFT,
+ font=('TkDefaultFont', 9), fg='#666').pack(anchor=tk.W)
+
+ # Buttons
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(20, 0))
+
+ def save_image_compression():
+ try:
+ # Validate numeric inputs
+ try:
+ int(self.target_image_tokens_var.get())
+ int(self.max_image_dimension_var.get())
+ float(self.max_image_size_mb_var.get())
+ except ValueError:
+ messagebox.showerror("Invalid Input", "Please enter valid numbers for numeric fields")
+ return
+
+ # Save all settings
+ self.config['enable_image_compression'] = self.enable_image_compression_var.get()
+ self.config['auto_compress_enabled'] = self.auto_compress_enabled_var.get()
+ self.config['target_image_tokens'] = int(self.target_image_tokens_var.get())
+ self.config['image_compression_format'] = self.image_format_var.get()
+ self.config['webp_quality'] = self.webp_quality_var.get()
+ self.config['jpeg_quality'] = self.jpeg_quality_var.get()
+ self.config['png_compression'] = self.png_compression_var.get()
+ self.config['max_image_dimension'] = int(self.max_image_dimension_var.get())
+ self.config['max_image_size_mb'] = float(self.max_image_size_mb_var.get())
+ self.config['preserve_transparency'] = self.preserve_transparency_var.get()
+ self.config['preserve_original_format'] = self.preserve_original_format_var.get()
+ self.config['optimize_for_ocr'] = self.optimize_for_ocr_var.get()
+ self.config['progressive_encoding'] = self.progressive_encoding_var.get()
+ self.config['save_compressed_images'] = self.save_compressed_images_var.get()
+
+ self.append_log("✅ Image compression settings saved")
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+
+ except Exception as e:
+ print(f"❌ Failed to save compression settings: {e}")
+ messagebox.showerror("Error", f"Failed to save settings: {e}")
+
+ tb.Button(button_frame, text="💾 Save Settings", command=save_image_compression,
+ bootstyle="success", width=20).pack(side=tk.LEFT, padx=5)
+
+ tb.Button(button_frame, text="❌ Cancel",
+ command=lambda: [dialog._cleanup_scrolling(), dialog.destroy()],
+ bootstyle="secondary", width=20).pack(side=tk.LEFT, padx=5)
+
+ # Toggle function for enable/disable
+ def _toggle_compression_options():
+ state = tk.NORMAL if self.enable_image_compression_var.get() else tk.DISABLED
+ for widget in self.compression_options_frame.winfo_children():
+ if isinstance(widget, (tk.LabelFrame, tk.Frame)):
+ for child in widget.winfo_children():
+ if isinstance(child, (tb.Checkbutton, tb.Entry, tb.Radiobutton, tk.Scale)):
+ child.config(state=state)
+ elif isinstance(child, tk.Frame):
+ for subchild in child.winfo_children():
+ if isinstance(subchild, (tb.Checkbutton, tb.Entry, tb.Radiobutton, tk.Scale)):
+ subchild.config(state=state)
+
+ self._toggle_compression_options = _toggle_compression_options
+
+ # Set initial state
+ _toggle_compression_options()
+
+ # Auto-resize and show
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.6, max_height_ratio=1.2)
+
+ dialog.protocol("WM_DELETE_WINDOW", lambda: [dialog._cleanup_scrolling(), dialog.destroy()])
+
+ def prompt_custom_token_limit(self):
+ val = simpledialog.askinteger(
+ "Set Max Output Token Limit",
+ "Enter max output tokens for API output (e.g., 16384, 32768, 65536):",
+ minvalue=1,
+ maxvalue=2000000
+ )
+ if val:
+ self.max_output_tokens = val
+ self.output_btn.config(text=f"Output Token Limit: {val}")
+ self.append_log(f"✅ Output token limit set to {val}")
+
+ def configure_rolling_summary_prompts(self):
+ """Configure rolling summary prompts"""
+ dialog = self.wm.create_simple_dialog(
+ self.master,
+ "Configure Memory System Prompts",
+ width=800,
+ height=1050
+ )
+
+ main_frame = tk.Frame(dialog, padx=20, pady=20)
+ main_frame.pack(fill=tk.BOTH, expand=True)
+
+ tk.Label(main_frame, text="Memory System Configuration",
+ font=('TkDefaultFont', 14, 'bold')).pack(anchor=tk.W, pady=(0, 5))
+
+ tk.Label(main_frame, text="Configure how the AI creates and maintains translation memory/context summaries.",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(0, 15))
+
+ system_frame = tk.LabelFrame(main_frame, text="System Prompt (Role Definition)", padx=10, pady=10)
+ system_frame.pack(fill=tk.BOTH, expand=True, pady=(0, 10))
+
+ tk.Label(system_frame, text="Defines the AI's role and behavior when creating summaries",
+ font=('TkDefaultFont', 9), fg='blue').pack(anchor=tk.W, pady=(0, 5))
+
+ self.summary_system_text = self.ui.setup_scrollable_text(
+ system_frame, height=5, wrap=tk.WORD
+ )
+ self.summary_system_text.pack(fill=tk.BOTH, expand=True)
+ self.summary_system_text.insert('1.0', self.rolling_summary_system_prompt)
+
+ user_frame = tk.LabelFrame(main_frame, text="User Prompt Template", padx=10, pady=10)
+ user_frame.pack(fill=tk.BOTH, expand=True, pady=(0, 10))
+
+ tk.Label(user_frame, text="Template for summary requests. Use {translations} for content placeholder",
+ font=('TkDefaultFont', 9), fg='blue').pack(anchor=tk.W, pady=(0, 5))
+
+ self.summary_user_text = self.ui.setup_scrollable_text(
+ user_frame, height=12, wrap=tk.WORD
+ )
+ self.summary_user_text.pack(fill=tk.BOTH, expand=True)
+ self.summary_user_text.insert('1.0', self.rolling_summary_user_prompt)
+
+ button_frame = tk.Frame(main_frame)
+ button_frame.pack(fill=tk.X, pady=(10, 0))
+
+ def save_prompts():
+ self.rolling_summary_system_prompt = self.summary_system_text.get('1.0', tk.END).strip()
+ self.rolling_summary_user_prompt = self.summary_user_text.get('1.0', tk.END).strip()
+
+ self.config['rolling_summary_system_prompt'] = self.rolling_summary_system_prompt
+ self.config['rolling_summary_user_prompt'] = self.rolling_summary_user_prompt
+
+ os.environ['ROLLING_SUMMARY_SYSTEM_PROMPT'] = self.rolling_summary_system_prompt
+ os.environ['ROLLING_SUMMARY_USER_PROMPT'] = self.rolling_summary_user_prompt
+
+ messagebox.showinfo("Success", "Memory prompts saved!")
+ dialog.destroy()
+
+ def reset_prompts():
+ if messagebox.askyesno("Reset Prompts", "Reset memory prompts to defaults?"):
+ self.summary_system_text.delete('1.0', tk.END)
+ self.summary_system_text.insert('1.0', self.default_rolling_summary_system_prompt)
+ self.summary_user_text.delete('1.0', tk.END)
+ self.summary_user_text.insert('1.0', self.default_rolling_summary_user_prompt)
+
+ tb.Button(button_frame, text="Save", command=save_prompts,
+ bootstyle="success", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Reset to Defaults", command=reset_prompts,
+ bootstyle="warning", width=15).pack(side=tk.LEFT, padx=5)
+ tb.Button(button_frame, text="Cancel", command=dialog.destroy,
+ bootstyle="secondary", width=15).pack(side=tk.LEFT, padx=5)
+
+ dialog.deiconify()
+
+ def toggle_thinking_budget(self):
+ """Enable/disable thinking budget entry based on checkbox state"""
+ if hasattr(self, 'thinking_budget_entry'):
+ if self.enable_gemini_thinking_var.get():
+ self.thinking_budget_entry.config(state='normal')
+ else:
+ self.thinking_budget_entry.config(state='disabled')
+
+ def toggle_gpt_reasoning_controls(self):
+ """Enable/disable GPT reasoning controls based on toggle state"""
+ enabled = self.enable_gpt_thinking_var.get()
+ # Tokens entry
+ if hasattr(self, 'gpt_reasoning_tokens_entry'):
+ self.gpt_reasoning_tokens_entry.config(state='normal' if enabled else 'disabled')
+ # Effort combo
+ if hasattr(self, 'gpt_effort_combo'):
+ try:
+ self.gpt_effort_combo.config(state='readonly' if enabled else 'disabled')
+ except Exception:
+ # Fallback for ttk on some platforms
+ self.gpt_effort_combo.configure(state='readonly' if enabled else 'disabled')
+
+ def open_other_settings(self):
+ """Open the Other Settings dialog"""
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Other Settings",
+ width=0,
+ height=None,
+ max_width_ratio=0.7,
+ max_height_ratio=0.8
+ )
+
+ scrollable_frame.grid_columnconfigure(0, weight=1, uniform="column")
+ scrollable_frame.grid_columnconfigure(1, weight=1, uniform="column")
+
+ # Section 1: Context Management
+ self._create_context_management_section(scrollable_frame)
+
+ # Section 2: Response Handling
+ self._create_response_handling_section(scrollable_frame)
+
+ # Section 3: Prompt Management
+ self._create_prompt_management_section(scrollable_frame)
+
+ # Section 4: Processing Options
+ self._create_processing_options_section(scrollable_frame)
+
+ # Section 5: Image Translation
+ self._create_image_translation_section(scrollable_frame)
+
+ # Section 6: Anti-Duplicate Parameters
+ self._create_anti_duplicate_section(scrollable_frame)
+
+ # Section 7: Custom API Endpoints (NEW)
+ self._create_custom_api_endpoints_section(scrollable_frame)
+
+ # Save & Close buttons
+ self._create_settings_buttons(scrollable_frame, dialog, canvas)
+
+ # Persist toggle change on dialog close
+ def _persist_settings():
+ self.config['retain_source_extension'] = self.retain_source_extension_var.get()
+ os.environ['RETAIN_SOURCE_EXTENSION'] = '1' if self.retain_source_extension_var.get() else '0'
+ # Save without user-facing message when closing Other Settings
+ self.save_config(show_message=False)
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+ dialog.protocol("WM_DELETE_WINDOW", _persist_settings)
+
+ # Auto-resize and show
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.78, max_height_ratio=1.82)
+
+ def _create_context_management_section(self, parent):
+ """Create context management section"""
+ section_frame = tk.LabelFrame(parent, text="Context Management & Memory", padx=10, pady=10)
+ section_frame.grid(row=0, column=1, sticky="nsew", padx=(5, 10), pady=(10, 5))
+
+ content_frame = tk.Frame(section_frame)
+ content_frame.pack(anchor=tk.NW, fill=tk.BOTH, expand=True)
+
+ tb.Checkbutton(content_frame, text="Use Rolling Summary (Memory)",
+ variable=self.rolling_summary_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(content_frame, text="AI-powered memory system that maintains story context",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ settings_frame = tk.Frame(content_frame)
+ settings_frame.pack(anchor=tk.W, padx=20, fill=tk.X, pady=(5, 10))
+
+ row1 = tk.Frame(settings_frame)
+ row1.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(row1, text="Role:").pack(side=tk.LEFT, padx=(0, 5))
+ role_combo = ttk.Combobox(row1, textvariable=self.summary_role_var,
+ values=["user", "system"], state="readonly", width=10)
+ role_combo.pack(side=tk.LEFT, padx=(0, 30))
+ # Prevent accidental changes from mouse wheel while scrolling
+ UIHelper.disable_spinbox_mousewheel(role_combo)
+
+ tk.Label(row1, text="Mode:").pack(side=tk.LEFT, padx=(0, 5))
+ mode_combo = ttk.Combobox(row1, textvariable=self.rolling_summary_mode_var,
+ values=["append", "replace"], state="readonly", width=10)
+ mode_combo.pack(side=tk.LEFT, padx=(0, 10))
+ # Prevent accidental changes from mouse wheel while scrolling
+ UIHelper.disable_spinbox_mousewheel(mode_combo)
+
+ row2 = tk.Frame(settings_frame)
+ row2.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(row2, text="Summarize last").pack(side=tk.LEFT, padx=(0, 5))
+ tb.Entry(row2, width=5, textvariable=self.rolling_summary_exchanges_var).pack(side=tk.LEFT, padx=(0, 5))
+ tk.Label(row2, text="exchanges").pack(side=tk.LEFT)
+
+ # Spacer
+ tk.Label(row2, text=" ").pack(side=tk.LEFT)
+ # New controls: Retain last N summaries (append mode)
+ tk.Label(row2, text="Retain").pack(side=tk.LEFT, padx=(10, 5))
+ tb.Entry(row2, width=5, textvariable=self.rolling_summary_max_entries_var).pack(side=tk.LEFT, padx=(0, 5))
+ tk.Label(row2, text="entries").pack(side=tk.LEFT)
+
+ tb.Button(content_frame, text="⚙️ Configure Memory Prompts",
+ command=self.configure_rolling_summary_prompts,
+ bootstyle="info-outline", width=30).pack(anchor=tk.W, padx=20, pady=(10, 10))
+
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+ tk.Label(section_frame, text="💡 Memory Mode:\n"
+ "• Append: Keeps adding summaries (longer context)\n"
+ "• Replace: Only keeps latest summary (concise)",
+ font=('TkDefaultFont', 11), fg='#666', justify=tk.LEFT).pack(anchor=tk.W, padx=5, pady=(0, 5))
+
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+
+ tk.Label(section_frame, text="Application Updates:", font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W, pady=(5, 5))
+
+ # Create a frame for update-related controls
+ update_frame = tk.Frame(section_frame)
+ update_frame.pack(anchor=tk.W, fill=tk.X)
+
+ tb.Button(update_frame, text="🔄 Check for Updates",
+ command=lambda: self.check_for_updates_manual(),
+ bootstyle="info-outline",
+ width=25).pack(side=tk.LEFT, pady=2)
+
+ # Add auto-update checkbox
+ tb.Checkbutton(update_frame, text="Check on startup",
+ variable=self.auto_update_check_var,
+ bootstyle="round-toggle").pack(side=tk.LEFT, padx=(10, 0))
+
+ tk.Label(section_frame, text="Check GitHub for new Glossarion releases\nand download updates",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, pady=(0, 5))
+
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+ tk.Label(section_frame, text="Config Backup Management:", font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W, pady=(5, 5))
+
+ # Create a frame for backup-related controls
+ backup_frame = tk.Frame(section_frame)
+ backup_frame.pack(anchor=tk.W, fill=tk.X)
+
+ tb.Button(backup_frame, text="💾 Create Backup",
+ command=lambda: self._create_manual_config_backup(),
+ bootstyle="success-outline",
+ width=20).pack(side=tk.LEFT, pady=2, padx=(0, 10))
+
+ tb.Button(backup_frame, text="↶ Restore Backup",
+ command=lambda: self._manual_restore_config(),
+ bootstyle="warning-outline",
+ width=20).pack(side=tk.LEFT, pady=2)
+
+ tk.Label(section_frame, text="Automatic backups are created before each config save.",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=5, pady=(5, 0))
+
+ def _create_response_handling_section(self, parent):
+ """Create response handling section with AI Hunter additions"""
+ section_frame = tk.LabelFrame(parent, text="Response Handling & Retry Logic", padx=10, pady=10)
+ section_frame.grid(row=1, column=0, sticky="nsew", padx=(10, 5), pady=5)
+
+ # GPT-5/OpenAI Reasoning Toggle (NEW)
+ tk.Label(section_frame, text="GPT-5 Thinking (OpenRouter/OpenAI-style)",
+ font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W)
+
+ gpt_frame = tk.Frame(section_frame)
+ gpt_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+
+ tb.Checkbutton(gpt_frame, text="Enable GPT / OR Thinking",
+ variable=self.enable_gpt_thinking_var,
+ bootstyle="round-toggle",
+ command=self.toggle_gpt_reasoning_controls).pack(side=tk.LEFT)
+
+ tk.Label(gpt_frame, text="Effort:").pack(side=tk.LEFT, padx=(20, 5))
+ self.gpt_effort_combo = ttk.Combobox(gpt_frame, textvariable=self.gpt_effort_var,
+ values=["low", "medium", "high"], state="readonly", width=8)
+ self.gpt_effort_combo.pack(side=tk.LEFT, padx=5)
+ UIHelper.disable_spinbox_mousewheel(self.gpt_effort_combo)
+
+ # Second row for OpenRouter-specific token budget
+ gpt_row2 = tk.Frame(section_frame)
+ gpt_row2.pack(anchor=tk.W, padx=40, pady=(5, 0))
+ tk.Label(gpt_row2, text="OR Thinking Tokens:").pack(side=tk.LEFT)
+ self.gpt_reasoning_tokens_entry = tb.Entry(gpt_row2, width=8, textvariable=self.gpt_reasoning_tokens_var)
+ self.gpt_reasoning_tokens_entry.pack(side=tk.LEFT, padx=5)
+ tk.Label(gpt_row2, text="tokens").pack(side=tk.LEFT)
+
+ # Initialize enabled state for GPT controls
+ self.toggle_gpt_reasoning_controls()
+
+ tk.Label(section_frame, text="Controls GPT-5 and OpenRouter reasoning. \nProvide Tokens to force a max token budget for other models; GPT-5 only uses Effort (low/medium/high).",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add Thinking Tokens Toggle with Budget Control (NEW)
+ tk.Label(section_frame, text="Gemini Thinking Mode",
+ font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W)
+
+ thinking_frame = tk.Frame(section_frame)
+ thinking_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+
+ tb.Checkbutton(thinking_frame, text="Enable Gemini Thinking",
+ variable=self.enable_gemini_thinking_var,
+ bootstyle="round-toggle",
+ command=self.toggle_thinking_budget).pack(side=tk.LEFT)
+
+ tk.Label(thinking_frame, text="Budget:").pack(side=tk.LEFT, padx=(20, 5))
+ self.thinking_budget_entry = tb.Entry(thinking_frame, width=8, textvariable=self.thinking_budget_var)
+ self.thinking_budget_entry.pack(side=tk.LEFT, padx=5)
+ tk.Label(thinking_frame, text="tokens").pack(side=tk.LEFT)
+
+ tk.Label(section_frame, text="Control Gemini's thinking process. 0 = disabled,\n512-24576 = limited thinking, -1 = dynamic (auto)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add separator after thinking toggle
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # ADD EXTRACTION WORKERS CONFIGURATION HERE
+ tk.Label(section_frame, text="Parallel Extraction",
+ font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W)
+
+ extraction_frame = tk.Frame(section_frame)
+ extraction_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+
+ tb.Checkbutton(extraction_frame, text="Enable Parallel Processing",
+ variable=self.enable_parallel_extraction_var,
+ bootstyle="round-toggle",
+ command=self.toggle_extraction_workers).pack(side=tk.LEFT)
+
+ tk.Label(extraction_frame, text="Workers:").pack(side=tk.LEFT, padx=(20, 5))
+ self.extraction_workers_entry = tb.Entry(extraction_frame, width=6, textvariable=self.extraction_workers_var)
+ self.extraction_workers_entry.pack(side=tk.LEFT, padx=5)
+ tk.Label(extraction_frame, text="threads").pack(side=tk.LEFT)
+
+ tk.Label(section_frame, text="Speed up EPUB extraction using multiple threads.\nRecommended: 4-8 workers (set to 1 to disable)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add separator after extraction workers
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # Multi API Key Management Section
+ multi_key_frame = tk.Frame(section_frame)
+ multi_key_frame.pack(anchor=tk.W, fill=tk.X, pady=(0, 15))
+
+ # Multi-key indicator and button in same row
+ multi_key_row = tk.Frame(multi_key_frame)
+ multi_key_row.pack(fill=tk.X)
+
+ # Show status if multi-key is enabled
+ if self.config.get('use_multi_api_keys', False):
+ multi_keys = self.config.get('multi_api_keys', [])
+ active_keys = sum(1 for k in multi_keys if k.get('enabled', True))
+
+ status_frame = tk.Frame(multi_key_row)
+ status_frame.pack(side=tk.LEFT, fill=tk.X, expand=True)
+
+ tk.Label(status_frame, text="🔑 Multi-Key Mode:",
+ font=('TkDefaultFont', 11, 'bold')).pack(side=tk.LEFT)
+
+ tk.Label(status_frame, text=f"ACTIVE ({active_keys}/{len(multi_keys)} keys)",
+ font=('TkDefaultFont', 11, 'bold'), fg='green').pack(side=tk.LEFT, padx=(5, 0))
+ else:
+ tk.Label(multi_key_row, text="🔑 Multi-Key Mode: DISABLED",
+ font=('TkDefaultFont', 11), fg='gray').pack(side=tk.LEFT)
+
+ # Multi API Key Manager button
+ tb.Button(multi_key_row, text="Configure API Keys",
+ command=self.open_multi_api_key_manager,
+ bootstyle="primary-outline",
+ width=20).pack(side=tk.RIGHT)
+
+ tk.Label(section_frame, text="Manage multiple API keys with automatic rotation and rate limit handling",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add separator after Multi API Key section
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # Retry Truncated
+ tb.Checkbutton(section_frame, text="Auto-retry Truncated Responses",
+ variable=self.retry_truncated_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+ retry_frame = tk.Frame(section_frame)
+ retry_frame.pack(anchor=tk.W, padx=20, pady=(5, 5))
+ tk.Label(retry_frame, text="Token constraint:").pack(side=tk.LEFT)
+ tb.Entry(retry_frame, width=8, textvariable=self.max_retry_tokens_var).pack(side=tk.LEFT, padx=5)
+ tk.Label(section_frame, text="Retry when truncated. Acts as min/max constraint:\nbelow value = minimum, above value = maximum",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+ # Compression Factor
+ # Add separator line for clarity
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # Compression Factor
+ tk.Label(section_frame, text="Translation Compression Factor",
+ font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W)
+
+ compression_frame = tk.Frame(section_frame)
+ compression_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+ tk.Label(compression_frame, text="CJK→English compression:").pack(side=tk.LEFT)
+ tb.Entry(compression_frame, width=6, textvariable=self.compression_factor_var).pack(side=tk.LEFT, padx=5)
+ tk.Label(compression_frame, text="(0.7-1.0)", font=('TkDefaultFont', 11)).pack(side=tk.LEFT)
+
+ tb.Button(compression_frame, text=" Chunk Prompt",
+ command=self.configure_translation_chunk_prompt,
+ bootstyle="info-outline", width=15).pack(side=tk.LEFT, padx=(15, 0))
+ tk.Label(section_frame, text="Ratio for chunk sizing based on output limits\n",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add separator after compression factor
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # Retry Duplicate
+ tb.Checkbutton(section_frame, text="Auto-retry Duplicate Content",
+ variable=self.retry_duplicate_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+ duplicate_frame = tk.Frame(section_frame)
+ duplicate_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+ tk.Label(duplicate_frame, text="Check last").pack(side=tk.LEFT)
+ tb.Entry(duplicate_frame, width=4, textvariable=self.duplicate_lookback_var).pack(side=tk.LEFT, padx=3)
+ tk.Label(duplicate_frame, text="chapters").pack(side=tk.LEFT)
+ tk.Label(section_frame, text="Detects when AI returns same content\nfor different chapters",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(5, 10))
+ # Container for detection-related options (to show/hide based on toggle)
+ self.detection_options_container = tk.Frame(section_frame)
+
+ # Update thinking budget entry state based on initial toggle state
+ self.toggle_thinking_budget()
+
+ # Function to show/hide detection options based on auto-retry toggle
+ def update_detection_visibility():
+ try:
+ # Check if widgets still exist before manipulating them
+ if (hasattr(self, 'detection_options_container') and
+ self.detection_options_container.winfo_exists() and
+ duplicate_frame.winfo_exists()):
+
+ if self.retry_duplicate_var.get():
+ self.detection_options_container.pack(fill='x', after=duplicate_frame)
+ else:
+ self.detection_options_container.pack_forget()
+ except tk.TclError:
+ # Widget has been destroyed, ignore
+ pass
+
+ # Add trace to update visibility when toggle changes
+ self.retry_duplicate_var.trace('w', lambda *args: update_detection_visibility())
+
+ # Detection Method subsection (now inside the container)
+ method_label = tk.Label(self.detection_options_container, text="Detection Method:",
+ font=('TkDefaultFont', 10, 'bold'))
+ method_label.pack(anchor=tk.W, padx=20, pady=(10, 5))
+
+ methods = [
+ ("basic", "Basic (Fast) - Original 85% threshold, 1000 chars"),
+ ("ai-hunter", "AI Hunter - Multi-method semantic analysis"),
+ ("cascading", "Cascading - Basic first, then AI Hunter")
+ ]
+
+ # Container for AI Hunter config (will be shown/hidden based on selection)
+ self.ai_hunter_container = tk.Frame(self.detection_options_container)
+
+ # Function to update AI Hunter visibility based on detection mode
+ def update_ai_hunter_visibility(*args):
+ """Update AI Hunter section visibility based on selection"""
+ # Clear existing widgets
+ for widget in self.ai_hunter_container.winfo_children():
+ widget.destroy()
+
+ # Show AI Hunter config for both ai-hunter and cascading modes
+ if self.duplicate_detection_mode_var.get() in ['ai-hunter', 'cascading']:
+ self.create_ai_hunter_section(self.ai_hunter_container)
+
+ # Update status if label exists and hasn't been destroyed
+ if hasattr(self, 'ai_hunter_status_label'):
+ try:
+ # Check if the widget still exists before updating
+ self.ai_hunter_status_label.winfo_exists()
+ self.ai_hunter_status_label.config(text=self._get_ai_hunter_status_text())
+ except tk.TclError:
+ # Widget has been destroyed, remove the reference
+ delattr(self, 'ai_hunter_status_label')
+
+ # Create radio buttons (inside detection container) - ONLY ONCE
+ for value, text in methods:
+ rb = tb.Radiobutton(self.detection_options_container, text=text,
+ variable=self.duplicate_detection_mode_var,
+ value=value, bootstyle="primary")
+ rb.pack(anchor=tk.W, padx=40, pady=2)
+
+ # Pack the AI Hunter container
+ self.ai_hunter_container.pack(fill='x')
+
+ # Add trace to detection mode variable - ONLY ONCE
+ self.duplicate_detection_mode_var.trace('w', update_ai_hunter_visibility)
+
+ # Initial visibility updates
+ update_detection_visibility()
+ update_ai_hunter_visibility()
+
+ # Retry Slow
+ tb.Checkbutton(section_frame, text="Auto-retry Slow Chunks",
+ variable=self.retry_timeout_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=(15, 0))
+
+ timeout_frame = tk.Frame(section_frame)
+ timeout_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+ tk.Label(timeout_frame, text="Timeout after").pack(side=tk.LEFT)
+ tb.Entry(timeout_frame, width=6, textvariable=self.chunk_timeout_var).pack(side=tk.LEFT, padx=5)
+ tk.Label(timeout_frame, text="seconds").pack(side=tk.LEFT)
+
+ tk.Label(section_frame, text="Retry chunks/images that take too long\n(reduces tokens for faster response)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Separator
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # HTTP Timeouts & Connection Pooling
+ title_http = tk.Label(section_frame, text="HTTP Timeouts & Connection Pooling",
+ font=('TkDefaultFont', 11, 'bold'))
+ title_http.pack(anchor=tk.W)
+
+ http_frame = tk.Frame(section_frame)
+ http_frame.pack(anchor=tk.W, padx=20, pady=(5, 0), fill=tk.X)
+
+ # Master toggle to enable/disable all HTTP tuning fields (disabled by default)
+ if not hasattr(self, 'enable_http_tuning_var'):
+ self.enable_http_tuning_var = tk.BooleanVar(value=self.config.get('enable_http_tuning', False))
+ self.http_tuning_checkbox = tb.Checkbutton(
+ http_frame,
+ text="Enable HTTP timeout/pooling overrides",
+ variable=self.enable_http_tuning_var,
+ command=getattr(self, '_toggle_http_tuning_controls', None) or (lambda: None),
+ bootstyle="round-toggle"
+ )
+ self.http_tuning_checkbox.pack(anchor=tk.W, pady=(0, 6))
+
+ # Build a compact grid so fields align nicely
+ http_grid = tk.Frame(http_frame)
+ http_grid.pack(anchor=tk.W, fill=tk.X)
+
+ if not hasattr(self, 'connect_timeout_var'):
+ self.connect_timeout_var = tk.StringVar(value=str(self.config.get('connect_timeout', os.environ.get('CONNECT_TIMEOUT', '10'))))
+ if not hasattr(self, 'read_timeout_var'):
+ # Default to READ_TIMEOUT, fallback to CHUNK_TIMEOUT if provided, else 180
+ self.read_timeout_var = tk.StringVar(value=str(self.config.get('read_timeout', os.environ.get('READ_TIMEOUT', os.environ.get('CHUNK_TIMEOUT', '180')))))
+ if not hasattr(self, 'http_pool_connections_var'):
+ self.http_pool_connections_var = tk.StringVar(value=str(self.config.get('http_pool_connections', os.environ.get('HTTP_POOL_CONNECTIONS', '20'))))
+ if not hasattr(self, 'http_pool_maxsize_var'):
+ self.http_pool_maxsize_var = tk.StringVar(value=str(self.config.get('http_pool_maxsize', os.environ.get('HTTP_POOL_MAXSIZE', '50'))))
+
+ # Layout columns
+ http_grid.grid_columnconfigure(0, weight=0)
+ http_grid.grid_columnconfigure(1, weight=0)
+ http_grid.grid_columnconfigure(2, weight=1) # spacer
+ http_grid.grid_columnconfigure(3, weight=0)
+ http_grid.grid_columnconfigure(4, weight=0)
+
+ # Optional toggle: ignore server Retry-After header
+ if not hasattr(self, 'ignore_retry_after_var'):
+ self.ignore_retry_after_var = tk.BooleanVar(value=bool(self.config.get('ignore_retry_after', str(os.environ.get('IGNORE_RETRY_AFTER', '0')) == '1')))
+ self.ignore_retry_after_checkbox = tb.Checkbutton(
+ http_frame,
+ text="Ignore server Retry-After header (use local backoff)",
+ variable=self.ignore_retry_after_var,
+ bootstyle="round-toggle"
+ )
+ self.ignore_retry_after_checkbox.pack(anchor=tk.W, pady=(6, 0))
+
+ # Row 0: Timeouts
+ tk.Label(http_grid, text="Connect timeout (s):").grid(row=0, column=0, sticky='w', padx=(0, 6), pady=2)
+ self.connect_timeout_entry = tb.Entry(http_grid, width=6, textvariable=self.connect_timeout_var)
+ self.connect_timeout_entry.grid(row=0, column=1, sticky='w', pady=2)
+ tk.Label(http_grid, text="Read timeout (s):").grid(row=0, column=3, sticky='w', padx=(12, 6), pady=2)
+ self.read_timeout_entry = tb.Entry(http_grid, width=6, textvariable=self.read_timeout_var)
+ self.read_timeout_entry.grid(row=0, column=4, sticky='w', pady=2)
+
+ # Row 1: Pool sizes
+ tk.Label(http_grid, text="Pool connections:").grid(row=1, column=0, sticky='w', padx=(0, 6), pady=2)
+ self.http_pool_connections_entry = tb.Entry(http_grid, width=6, textvariable=self.http_pool_connections_var)
+ self.http_pool_connections_entry.grid(row=1, column=1, sticky='w', pady=2)
+ tk.Label(http_grid, text="Pool max size:").grid(row=1, column=3, sticky='w', padx=(12, 6), pady=2)
+ self.http_pool_maxsize_entry = tb.Entry(http_grid, width=6, textvariable=self.http_pool_maxsize_var)
+ self.http_pool_maxsize_entry.grid(row=1, column=4, sticky='w', pady=2)
+
+ # Apply initial enable/disable state
+ if hasattr(self, '_toggle_http_tuning_controls'):
+ self._toggle_http_tuning_controls()
+
+ tk.Label(section_frame, text="Controls network behavior to reduce 500/503s: connection establishment timeout, read timeout,\nHTTP connection pool sizes.",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(2, 5))
+
+ # Separator
+ ttk.Separator(section_frame, orient='horizontal').pack(fill='x', pady=10)
+
+ # Max Retries Configuration
+ title_retries = tk.Label(section_frame, text="API Request Retries",
+ font=('TkDefaultFont', 11, 'bold'))
+ title_retries.pack(anchor=tk.W)
+
+ retries_frame = tk.Frame(section_frame)
+ retries_frame.pack(anchor=tk.W, padx=20, pady=(5, 0))
+
+ # Create MAX_RETRIES variable if it doesn't exist
+ if not hasattr(self, 'max_retries_var'):
+ self.max_retries_var = tk.StringVar(value=str(self.config.get('max_retries', os.environ.get('MAX_RETRIES', '7'))))
+
+ tk.Label(retries_frame, text="Maximum retry attempts:").pack(side=tk.LEFT)
+ tb.Entry(retries_frame, width=4, textvariable=self.max_retries_var).pack(side=tk.LEFT, padx=5)
+ tk.Label(retries_frame, text="(default: 7)").pack(side=tk.LEFT)
+
+ tk.Label(section_frame, text="Number of times to retry failed API requests before giving up.\nApplies to all API providers (OpenAI, Gemini, Anthropic, etc.)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(2, 10))
+
+ # Enable/disable combobox based on toggle
+ def _toggle_scan_mode_state(*args):
+ try:
+ if self.scan_phase_enabled_var.get():
+ scan_mode_combo.config(state="readonly")
+ else:
+ scan_mode_combo.config(state="disabled")
+ except Exception:
+ pass
+ _toggle_scan_mode_state()
+ self.scan_phase_enabled_var.trace('w', lambda *a: _toggle_scan_mode_state())
+
+ # Indefinite Rate Limit Retry toggle
+ tb.Checkbutton(section_frame, text="Indefinite Rate Limit Retry",
+ variable=self.indefinite_rate_limit_retry_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, padx=20)
+
+ tk.Label(section_frame, text="When enabled, rate limit errors (429) will retry indefinitely with exponential backoff.\nWhen disabled, rate limits count against the maximum retry attempts above.",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=40, pady=(2, 5))
+
+
+ def toggle_gemini_endpoint(self):
+ """Enable/disable Gemini endpoint entry based on toggle"""
+ if self.use_gemini_openai_endpoint_var.get():
+ self.gemini_endpoint_entry.config(state='normal')
+ else:
+ self.gemini_endpoint_entry.config(state='disabled')
+
+ def open_multi_api_key_manager(self):
+ """Open the multi API key manager dialog"""
+ # Import here to avoid circular imports
+ try:
+ from multi_api_key_manager import MultiAPIKeyDialog
+
+ # Create and show dialog
+ dialog = MultiAPIKeyDialog(self.master, self)
+
+ # Wait for dialog to close
+ self.master.wait_window(dialog.dialog)
+
+ # Refresh the settings display if in settings dialog
+ if hasattr(self, 'current_settings_dialog'):
+ # Close and reopen settings to refresh
+ self.current_settings_dialog.destroy()
+ self.show_settings() # or open_other_settings()
+
+ except ImportError as e:
+ messagebox.showerror("Error", f"Failed to load Multi API Key Manager: {str(e)}")
+ except Exception as e:
+ messagebox.showerror("Error", f"Error opening Multi API Key Manager: {str(e)}")
+ import traceback
+ traceback.print_exc()
+
+ def _create_multi_key_row(self, parent):
+ """Create a compact multi-key configuration row"""
+ frame = tk.Frame(parent)
+ frame.pack(fill=tk.X, pady=5)
+
+ # Status indicator
+ if self.config.get('use_multi_api_keys', False):
+ keys = self.config.get('multi_api_keys', [])
+ active = sum(1 for k in keys if k.get('enabled', True))
+
+ # Checkbox to enable/disable
+ tb.Checkbutton(frame, text="Multi API Key Mode",
+ variable=self.use_multi_api_keys_var,
+ bootstyle="round-toggle",
+ command=self._toggle_multi_key_setting).pack(side=tk.LEFT)
+
+ # Status
+ tk.Label(frame, text=f"({active}/{len(keys)} active)",
+ font=('TkDefaultFont', 10), fg='green').pack(side=tk.LEFT, padx=(5, 0))
+ else:
+ tb.Checkbutton(frame, text="Multi API Key Mode",
+ variable=self.use_multi_api_keys_var,
+ bootstyle="round-toggle",
+ command=self._toggle_multi_key_setting).pack(side=tk.LEFT)
+
+ # Configure button
+ tb.Button(frame, text="Configure Keys...",
+ command=self.open_multi_api_key_manager,
+ bootstyle="primary-outline").pack(side=tk.LEFT, padx=(20, 0))
+
+ return frame
+
+ def _toggle_multi_key_setting(self):
+ """Toggle multi-key mode from settings dialog"""
+ self.config['use_multi_api_keys'] = self.use_multi_api_keys_var.get()
+ # Don't save immediately, let the dialog's save button handle it
+
+ def toggle_extraction_workers(self):
+ """Enable/disable extraction workers entry based on toggle"""
+ if self.enable_parallel_extraction_var.get():
+ self.extraction_workers_entry.config(state='normal')
+ # Set environment variable
+ os.environ["EXTRACTION_WORKERS"] = str(self.extraction_workers_var.get())
+ else:
+ self.extraction_workers_entry.config(state='disabled')
+ # Set to 1 worker (sequential) when disabled
+ os.environ["EXTRACTION_WORKERS"] = "1"
+
+ # Ensure executor reflects current worker setting
+ try:
+ self._ensure_executor()
+ except Exception:
+ pass
+
+ def create_ai_hunter_section(self, parent_frame):
+ """Create the AI Hunter configuration section - without redundant toggle"""
+ # AI Hunter Configuration
+ config_frame = tk.Frame(parent_frame)
+ config_frame.pack(anchor=tk.W, padx=20, pady=(10, 5))
+
+ # Status label
+ ai_config = self.config.get('ai_hunter_config', {})
+ self.ai_hunter_status_label = tk.Label(
+ config_frame,
+ text=self._get_ai_hunter_status_text(),
+ font=('TkDefaultFont', 10)
+ )
+ self.ai_hunter_status_label.pack(side=tk.LEFT)
+
+ # Configure button
+ tb.Button(
+ config_frame,
+ text="Configure AI Hunter",
+ command=self.show_ai_hunter_settings,
+ bootstyle="info"
+ ).pack(side=tk.LEFT, padx=(10, 0))
+
+ # Info text
+ tk.Label(
+ parent_frame, # Use parent_frame instead of section_frame
+ text="AI Hunter uses multiple detection methods to identify duplicate content\n"
+ "with configurable thresholds and detection modes",
+ font=('TkDefaultFont', 10),
+ fg='gray',
+ justify=tk.LEFT
+ ).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ def _get_ai_hunter_status_text(self):
+ """Get status text for AI Hunter configuration"""
+ ai_config = self.config.get('ai_hunter_config', {})
+
+ # AI Hunter is shown when the detection mode is set to 'ai-hunter' or 'cascading'
+ if self.duplicate_detection_mode_var.get() not in ['ai-hunter', 'cascading']:
+ return "AI Hunter: Not Selected"
+
+ if not ai_config.get('enabled', True):
+ return "AI Hunter: Disabled in Config"
+
+ mode_text = {
+ 'single_method': 'Single Method',
+ 'multi_method': 'Multi-Method',
+ 'weighted_average': 'Weighted Average'
+ }
+
+ mode = mode_text.get(ai_config.get('detection_mode', 'multi_method'), 'Unknown')
+ thresholds = ai_config.get('thresholds', {})
+
+ if thresholds:
+ avg_threshold = sum(thresholds.values()) / len(thresholds)
+ else:
+ avg_threshold = 85
+
+ return f"AI Hunter: {mode} mode, Avg threshold: {int(avg_threshold)}%"
+
+ def show_ai_hunter_settings(self):
+ """Open AI Hunter configuration window"""
+ def on_config_saved():
+ # Save the entire configuration
+ self.save_config()
+ # Update status label if it still exists
+ if hasattr(self, 'ai_hunter_status_label'):
+ try:
+ self.ai_hunter_status_label.winfo_exists()
+ self.ai_hunter_status_label.config(text=self._get_ai_hunter_status_text())
+ except tk.TclError:
+ # Widget has been destroyed
+ pass
+ if hasattr(self, 'ai_hunter_enabled_var'):
+ self.ai_hunter_enabled_var.set(self.config.get('ai_hunter_config', {}).get('enabled', True))
+
+ gui = AIHunterConfigGUI(self.master, self.config, on_config_saved)
+ gui.show_ai_hunter_config()
+
+ def toggle_ai_hunter(self):
+ """Toggle AI Hunter enabled state"""
+ if 'ai_hunter_config' not in self.config:
+ self.config['ai_hunter_config'] = {}
+
+ self.config['ai_hunter_config']['enabled'] = self.ai_hunter_enabled_var.get()
+ self.save_config()
+ self.ai_hunter_status_label.config(text=self._get_ai_hunter_status_text())
+
+ def _create_prompt_management_section(self, parent):
+ """Create meta data section (formerly prompt management)"""
+ section_frame = tk.LabelFrame(parent, text="Meta Data", padx=10, pady=10)
+ section_frame.grid(row=0, column=0, sticky="nsew", padx=(10, 5), pady=(10, 5))
+
+ title_frame = tk.Frame(section_frame)
+ title_frame.pack(anchor=tk.W, pady=(10, 10))
+
+ tb.Checkbutton(title_frame, text="Translate Book Title",
+ variable=self.translate_book_title_var,
+ bootstyle="round-toggle").pack(side=tk.LEFT)
+
+ # CHANGED: New button text and command
+ tb.Button(title_frame, text="Configure All",
+ command=self.metadata_batch_ui.configure_translation_prompts,
+ bootstyle="info-outline", width=12).pack(side=tk.LEFT, padx=(10, 5))
+
+ # NEW: Custom Metadata Fields button
+ tb.Button(title_frame, text="Custom Metadata",
+ command=self.metadata_batch_ui.configure_metadata_fields,
+ bootstyle="info-outline", width=15).pack(side=tk.LEFT, padx=(5, 0))
+
+ tk.Label(section_frame, text="When enabled: Book titles and selected metadata will be translated",
+ font=('TkDefaultFont', 11), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # NEW: Batch Header Translation Section
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(5, 10))
+
+ tk.Label(section_frame, text="Chapter Header Translation:",
+ font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W, pady=(5, 5))
+
+ header_frame = tk.Frame(section_frame)
+ header_frame.pack(anchor=tk.W, fill=tk.X, pady=(5, 10))
+
+ # Master toggle for batch header translation
+ def _toggle_header_controls():
+ enabled = bool(self.batch_translate_headers_var.get())
+ new_state = tk.NORMAL if enabled else tk.DISABLED
+ update_cb.configure(state=new_state)
+ save_cb.configure(state=new_state)
+ ignore_header_cb.configure(state=new_state)
+ ignore_title_cb.configure(state=new_state)
+ delete_btn.configure(state=new_state)
+
+ batch_toggle = tb.Checkbutton(header_frame, text="Batch Translate Headers",
+ variable=self.batch_translate_headers_var,
+ bootstyle="round-toggle",
+ command=_toggle_header_controls)
+ batch_toggle.pack(side=tk.LEFT)
+
+ tk.Label(header_frame, text="Headers per batch:").pack(side=tk.LEFT, padx=(20, 5))
+
+ batch_entry = tk.Entry(header_frame, textvariable=self.headers_per_batch_var, width=10)
+ batch_entry.pack(side=tk.LEFT)
+
+ # Options for header translation
+ update_frame = tk.Frame(section_frame)
+ update_frame.pack(anchor=tk.W, fill=tk.X, padx=20)
+
+ update_cb = tb.Checkbutton(update_frame, text="Update headers in HTML files",
+ variable=self.update_html_headers_var,
+ bootstyle="round-toggle")
+ update_cb.pack(side=tk.LEFT)
+
+ save_cb = tb.Checkbutton(update_frame, text="Save translations to .txt",
+ variable=self.save_header_translations_var,
+ bootstyle="round-toggle")
+ save_cb.pack(side=tk.LEFT, padx=(20, 0))
+
+ # Additional ignore header option
+ ignore_frame = tk.Frame(section_frame)
+ ignore_frame.pack(anchor=tk.W, fill=tk.X, padx=20, pady=(5, 0))
+
+ ignore_header_cb = tb.Checkbutton(ignore_frame, text="Ignore header",
+ variable=self.ignore_header_var,
+ bootstyle="round-toggle")
+ ignore_header_cb.pack(side=tk.LEFT)
+
+ ignore_title_cb = tb.Checkbutton(ignore_frame, text="Ignore title",
+ variable=self.ignore_title_var,
+ bootstyle="round-toggle")
+ ignore_title_cb.pack(side=tk.LEFT, padx=(15, 0))
+
+ # Delete translated_headers.txt button
+ delete_btn = tb.Button(ignore_frame, text="🗑️Delete Header Files",
+ command=self.delete_translated_headers_file,
+ bootstyle="danger-outline", width=21)
+ delete_btn.pack(side=tk.LEFT, padx=(20, 0))
+
+ # Initialize disabled state when batch headers is OFF
+ _toggle_header_controls()
+
+ tk.Label(section_frame,
+ text="• OFF: Use existing headers from translated chapters\n"
+ "• ON: Extract all headers → Translate in batch → Update files\n"
+ "• Ignore header: Skip h1/h2/h3 tags (prevents re-translation of visible headers)\n"
+ "• Ignore title: Skip tag (prevents re-translation of document titles)\n"
+ "• Delete button: Removes translated_headers.txt files for all selected EPUBs",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(5, 10))
+
+ # EPUB Validation (keep existing)
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+ tk.Label(section_frame, text="EPUB Utilities:", font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W, pady=(5, 5))
+
+ tb.Button(section_frame, text="🔍 Validate EPUB Structure",
+ command=self.validate_epub_structure_gui,
+ bootstyle="success-outline",
+ width=25).pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="Check if all required EPUB files are present for compilation",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, pady=(0, 5))
+
+ # NCX-only navigation toggle
+ tb.Checkbutton(section_frame, text="Use NCX-only Navigation (Compatibility Mode)",
+ variable=self.force_ncx_only_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=(5, 5))
+
+ # CSS Attachment toggle - NEW!
+ tb.Checkbutton(section_frame, text="Attach CSS to Chapters (Fixes styling issues)",
+ variable=self.attach_css_to_chapters_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=(5, 5))
+
+ # Output file naming
+ tb.Checkbutton(section_frame, text="Retain source extension (no 'response_' prefix)",
+ variable=self.retain_source_extension_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=(5, 5))
+
+ def _create_processing_options_section(self, parent):
+ """Create processing options section"""
+ section_frame = tk.LabelFrame(parent, text="Processing Options", padx=10, pady=10)
+ section_frame.grid(row=1, column=1, sticky="nsew", padx=(5, 10), pady=5)
+
+ # Reinforce messages option
+ reinforce_frame = tk.Frame(section_frame)
+ reinforce_frame.pack(anchor=tk.W, pady=(0, 10))
+ tk.Label(reinforce_frame, text="Reinforce every").pack(side=tk.LEFT)
+ tb.Entry(reinforce_frame, width=6, textvariable=self.reinforcement_freq_var).pack(side=tk.LEFT, padx=5)
+ tk.Label(reinforce_frame, text="messages").pack(side=tk.LEFT)
+
+ tb.Checkbutton(section_frame, text="Emergency Paragraph Restoration",
+ variable=self.emergency_restore_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="Fixes AI responses that lose paragraph\nstructure (wall of text)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ tb.Checkbutton(section_frame, text="Enable Decimal Chapter Detection (EPUBs)",
+ variable=self.enable_decimal_chapters_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="Detect chapters like 1.1, 1.2 in EPUB files\n(Text files always use decimal chapters when split)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # === CHAPTER EXTRACTION SETTINGS ===
+ # Main extraction frame
+ extraction_frame = tk.LabelFrame(section_frame, text="Chapter Extraction Settings", padx=10, pady=5)
+ extraction_frame.pack(fill=tk.X, pady=(0, 10))
+
+ # Initialize variables if not exists
+ if not hasattr(self, 'text_extraction_method_var'):
+ # Check if using old enhanced mode
+ if self.config.get('extraction_mode') == 'enhanced':
+ self.text_extraction_method_var = tk.StringVar(value='enhanced')
+ # Set filtering from enhanced_filtering or default to smart
+ self.file_filtering_level_var = tk.StringVar(
+ value=self.config.get('enhanced_filtering', 'smart')
+ )
+ else:
+ self.text_extraction_method_var = tk.StringVar(value='standard')
+ self.file_filtering_level_var = tk.StringVar(
+ value=self.config.get('extraction_mode', 'smart')
+ )
+
+ if not hasattr(self, 'enhanced_preserve_structure_var'):
+ self.enhanced_preserve_structure_var = tk.BooleanVar(
+ value=self.config.get('enhanced_preserve_structure', True)
+ )
+
+ # --- Text Extraction Method Section ---
+ method_frame = tk.Frame(extraction_frame)
+ method_frame.pack(fill=tk.X, pady=(0, 15))
+
+ tk.Label(method_frame, text="Text Extraction Method:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W, pady=(0, 5))
+
+ # Standard extraction
+ tb.Radiobutton(method_frame, text="Standard (BeautifulSoup)",
+ variable=self.text_extraction_method_var, value="standard",
+ bootstyle="round-toggle",
+ command=self.on_extraction_method_change).pack(anchor=tk.W, pady=2)
+
+ tk.Label(method_frame, text="Traditional HTML parsing - fast and reliable",
+ font=('TkDefaultFont', 9), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Enhanced extraction
+ tb.Radiobutton(method_frame, text="🚀 Enhanced (html2text)",
+ variable=self.text_extraction_method_var, value="enhanced",
+ bootstyle="success-round-toggle",
+ command=self.on_extraction_method_change).pack(anchor=tk.W, pady=2)
+
+ tk.Label(method_frame, text="Superior Unicode handling, cleaner text extraction",
+ font=('TkDefaultFont', 9), fg='dark green', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Enhanced options (shown when enhanced is selected)
+ self.enhanced_options_frame = tk.Frame(method_frame)
+ self.enhanced_options_frame.pack(fill=tk.X, padx=20, pady=(5, 0))
+
+ # Structure preservation
+ tb.Checkbutton(self.enhanced_options_frame, text="Preserve Markdown Structure",
+ variable=self.enhanced_preserve_structure_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(self.enhanced_options_frame, text="Keep formatting (bold, headers, lists) for better AI context",
+ font=('TkDefaultFont', 8), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 3))
+
+ # Requirements note
+ requirements_frame = tk.Frame(self.enhanced_options_frame)
+ requirements_frame.pack(anchor=tk.W, pady=(5, 0))
+
+ # Separator
+ ttk.Separator(method_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+ # --- File Filtering Level Section ---
+ filtering_frame = tk.Frame(extraction_frame)
+ filtering_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tk.Label(filtering_frame, text="File Filtering Level:",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor=tk.W, pady=(0, 5))
+
+ # Smart filtering
+ tb.Radiobutton(filtering_frame, text="Smart (Aggressive Filtering)",
+ variable=self.file_filtering_level_var, value="smart",
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(filtering_frame, text="Skips navigation, TOC, copyright files\nBest for clean EPUBs with clear chapter structure",
+ font=('TkDefaultFont', 9), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Comprehensive filtering
+ tb.Radiobutton(filtering_frame, text="Comprehensive (Moderate Filtering)",
+ variable=self.file_filtering_level_var, value="comprehensive",
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(filtering_frame, text="Only skips obvious navigation files\nGood when Smart mode misses chapters",
+ font=('TkDefaultFont', 9), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Full extraction
+ tb.Radiobutton(filtering_frame, text="Full (No Filtering)",
+ variable=self.file_filtering_level_var, value="full",
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(filtering_frame, text="Extracts ALL HTML/XHTML files\nUse when other modes skip important content",
+ font=('TkDefaultFont', 9), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # NEW: Force BeautifulSoup for Traditional APIs toggle
+ if not hasattr(self, 'force_bs_for_traditional_var'):
+ self.force_bs_for_traditional_var = tk.BooleanVar(
+ value=self.config.get('force_bs_for_traditional', True)
+ )
+ tb.Checkbutton(extraction_frame, text="Force BeautifulSoup for DeepL / Google Translate",
+ variable=self.force_bs_for_traditional_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=(0, 5))
+ tk.Label(extraction_frame, text="When enabled, DeepL/Google Translate always use BeautifulSoup extraction even if Enhanced is selected.",
+ font=('TkDefaultFont', 8), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # Chapter merging option
+ ttk.Separator(extraction_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+ # Initialize disable_chapter_merging_var if not exists
+ if not hasattr(self, 'disable_chapter_merging_var'):
+ self.disable_chapter_merging_var = tk.BooleanVar(
+ value=self.config.get('disable_chapter_merging', False)
+ )
+
+ tb.Checkbutton(extraction_frame, text="Disable Chapter Merging",
+ variable=self.disable_chapter_merging_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(extraction_frame, text="Disable automatic merging of Section/Chapter pairs.\nEach file will be treated as a separate chapter.",
+ font=('TkDefaultFont', 9), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 5))
+
+ # === REMAINING OPTIONS ===
+ tb.Checkbutton(section_frame, text="Disable Image Gallery in EPUB",
+ variable=self.disable_epub_gallery_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="Skip creating image gallery page in EPUB",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # New: Disable Automatic Cover Creation
+ tb.Checkbutton(section_frame, text="Disable Automatic Cover Creation",
+ variable=self.disable_automatic_cover_creation_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="When enabled: no auto-generated cover page is created.",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # New: Translate cover.html (Skip Override)
+ tb.Checkbutton(section_frame, text="Translate cover.html (Skip Override)",
+ variable=self.translate_cover_html_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="When enabled: existing cover.html/cover.xhtml will be included and translated (not skipped).",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ tb.Checkbutton(section_frame, text="Disable 0-based Chapter Detection",
+ variable=self.disable_zero_detection_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="Always use chapter ranges as specified\n(don't force adjust to chapter 1)",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ tb.Checkbutton(section_frame, text="Use Header as Output Name",
+ variable=self.use_header_as_output_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=2)
+
+ tk.Label(section_frame, text="Use chapter headers/titles as output filenames",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Chapter number offset
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(10, 10))
+
+ offset_frame = tk.Frame(section_frame)
+ offset_frame.pack(anchor=tk.W, pady=5)
+
+ tk.Label(offset_frame, text="Chapter Number Offset:").pack(side=tk.LEFT)
+
+ # Create variable if not exists
+ if not hasattr(self, 'chapter_number_offset_var'):
+ self.chapter_number_offset_var = tk.StringVar(
+ value=str(self.config.get('chapter_number_offset', '0'))
+ )
+
+ tb.Entry(offset_frame, width=6, textvariable=self.chapter_number_offset_var).pack(side=tk.LEFT, padx=5)
+
+ tk.Label(offset_frame, text="(+/- adjustment)").pack(side=tk.LEFT)
+
+ tk.Label(section_frame, text="Adjust all chapter numbers by this amount.\nUseful for matching file numbers to actual chapters.",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add separator before API safety settings
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(15, 10))
+
+ # Post-Translation Scanning Phase
+ scan_phase_frame = tk.Frame(section_frame)
+ scan_phase_frame.pack(anchor=tk.W, fill=tk.X, pady=(10, 0))
+
+ tb.Checkbutton(scan_phase_frame, text="Enable post-translation Scanning phase",
+ variable=self.scan_phase_enabled_var,
+ bootstyle="round-toggle").pack(side=tk.LEFT)
+
+ # Mode selector
+ tk.Label(scan_phase_frame, text="Mode:").pack(side=tk.LEFT, padx=(15, 5))
+ scan_modes = ["quick-scan", "aggressive", "ai-hunter", "custom"]
+ scan_mode_combo = ttk.Combobox(scan_phase_frame, textvariable=self.scan_phase_mode_var, values=scan_modes, state="readonly", width=12)
+ scan_mode_combo.pack(side=tk.LEFT)
+ # Prevent accidental changes from mouse wheel while scrolling
+ UIHelper.disable_spinbox_mousewheel(scan_mode_combo)
+
+ tk.Label(section_frame, text="Automatically run QA Scanner after translation completes",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Conservative Batching Toggle
+ tb.Checkbutton(section_frame, text="Use Conservative Batching",
+ variable=self.conservative_batching_var,
+ bootstyle="round-toggle").pack(anchor=tk.W, pady=(10, 0))
+
+ tk.Label(section_frame, text="When enabled: Groups chapters in batches of 3x batch size for memory management\nWhen disabled (default): Uses direct batch size for faster processing",
+ font=('TkDefaultFont', 10), fg='gray', justify=tk.LEFT).pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ ttk.Separator(section_frame, orient='horizontal').pack(fill=tk.X, pady=(15, 10))
+
+ # API Safety Settings subsection
+ tk.Label(section_frame, text="API Safety Settings",
+ font=('TkDefaultFont', 11, 'bold')).pack(anchor=tk.W, pady=(5, 5))
+
+ # Create the Gemini safety checkbox
+ if not hasattr(self, 'disable_gemini_safety_var'):
+ self.disable_gemini_safety_var = tk.BooleanVar(
+ value=self.config.get('disable_gemini_safety', False)
+ )
+
+ tb.Checkbutton(
+ section_frame,
+ text="Disable Gemini API Safety Filters",
+ variable=self.disable_gemini_safety_var,
+ bootstyle="round-toggle"
+ ).pack(anchor=tk.W, pady=(5, 0))
+
+ # Add warning text
+ warning_text = ("⚠️ Disables ALL content safety filters for Gemini models.\n"
+ "This sets all harm categories to BLOCK_NONE.\n")
+ tk.Label(
+ section_frame,
+ text=warning_text,
+ font=('TkDefaultFont', 9),
+ fg='#ff6b6b',
+ justify=tk.LEFT
+ ).pack(anchor=tk.W, padx=(20, 0), pady=(0, 5))
+
+ # Add note about affected models
+ tk.Label(
+ section_frame,
+ text="Does NOT affect ElectronHub Gemini models (eh/gemini-*)",
+ font=('TkDefaultFont', 8),
+ fg='gray',
+ justify=tk.LEFT
+ ).pack(anchor=tk.W, padx=(20, 0))
+
+ # New: OpenRouter Transport Preference
+ # Toggle to force HTTP-only path for OpenRouter (SDK bypass)
+ if not hasattr(self, 'openrouter_http_only_var'):
+ self.openrouter_http_only_var = tk.BooleanVar(
+ value=self.config.get('openrouter_use_http_only', False)
+ )
+
+ tb.Checkbutton(
+ section_frame,
+ text="Use HTTP-only for OpenRouter (bypass SDK)",
+ variable=self.openrouter_http_only_var,
+ bootstyle="round-toggle"
+ ).pack(anchor=tk.W, pady=(8, 0))
+
+ tk.Label(
+ section_frame,
+ text="When enabled, requests to OpenRouter use direct HTTP POST with explicit headers (Accept, Referer, X-Title).",
+ font=('TkDefaultFont', 9),
+ fg='gray',
+ justify=tk.LEFT
+ ).pack(anchor=tk.W, padx=(20, 0), pady=(0, 5))
+
+ # OpenRouter: Disable compression (Accept-Encoding: identity)
+ if not hasattr(self, 'openrouter_accept_identity_var'):
+ self.openrouter_accept_identity_var = tk.BooleanVar(
+ value=self.config.get('openrouter_accept_identity', False)
+ )
+ tb.Checkbutton(
+ section_frame,
+ text="Disable compression for OpenRouter (Accept-Encoding)",
+ variable=self.openrouter_accept_identity_var,
+ bootstyle="round-toggle"
+ ).pack(anchor=tk.W, pady=(4, 0))
+ tk.Label(
+ section_frame,
+ text="Sends Accept-Encoding: identity to request uncompressed responses.\n"
+ "Use if proxies/CDNs cause corrupted or non-JSON compressed bodies.",
+ font=('TkDefaultFont', 8),
+ fg='gray',
+ justify=tk.LEFT
+ ).pack(anchor=tk.W, padx=(20, 0), pady=(0, 8))
+
+ # Initial state - show/hide enhanced options
+ self.on_extraction_method_change()
+
+ def on_extraction_method_change(self):
+ """Handle extraction method changes and show/hide Enhanced options"""
+ if hasattr(self, 'text_extraction_method_var') and hasattr(self, 'enhanced_options_frame'):
+ if self.text_extraction_method_var.get() == 'enhanced':
+ self.enhanced_options_frame.pack(fill=tk.X, padx=20, pady=(5, 0))
+ else:
+ self.enhanced_options_frame.pack_forget()
+
+ def _create_image_translation_section(self, parent):
+ """Create image translation section"""
+ section_frame = tk.LabelFrame(parent, text="Image Translation", padx=10, pady=8)
+ section_frame.grid(row=2, column=0, columnspan=2, sticky="nsew", padx=10, pady=(5, 10))
+
+ left_column = tk.Frame(section_frame)
+ left_column.pack(side=tk.LEFT, fill=tk.BOTH, expand=True, padx=(0, 20))
+
+ right_column = tk.Frame(section_frame)
+ right_column.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+
+ # Left column
+ enable_frame = tk.Frame(left_column)
+ enable_frame.pack(fill=tk.X, pady=(0, 10))
+
+ tb.Checkbutton(enable_frame, text="Enable Image Translation",
+ variable=self.enable_image_translation_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(left_column, text="Extracts and translates text from images using vision models",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(0, 10))
+
+ tb.Checkbutton(left_column, text="Process Long Images (Web Novel Style)",
+ variable=self.process_webnovel_images_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(left_column, text="Include tall images often used in web novels",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ tb.Checkbutton(left_column, text="Hide labels and remove OCR images",
+ variable=self.hide_image_translation_label_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(left_column, text="Clean mode: removes image and shows only translated text",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Add some spacing
+ tk.Frame(left_column, height=10).pack()
+
+ # Watermark removal toggle
+ tb.Checkbutton(left_column, text="Enable Watermark Removal",
+ variable=self.enable_watermark_removal_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(left_column, text="Advanced preprocessing to remove watermarks from images",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ # Save cleaned images toggle - create with reference
+ self.save_cleaned_checkbox = tb.Checkbutton(left_column, text="Save Cleaned Images",
+ variable=self.save_cleaned_images_var,
+ bootstyle="round-toggle")
+ self.save_cleaned_checkbox.pack(anchor=tk.W, padx=(20, 0))
+
+ tk.Label(left_column, text="Keep watermark-removed images in translated_images/cleaned/",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, padx=40, pady=(0, 10))
+
+ # Advanced watermark removal toggle - create with reference
+ self.advanced_watermark_checkbox = tb.Checkbutton(left_column, text="Advanced Watermark Removal",
+ variable=self.advanced_watermark_removal_var,
+ bootstyle="round-toggle")
+ self.advanced_watermark_checkbox.pack(anchor=tk.W, padx=(20, 0))
+
+ tk.Label(left_column, text="Use FFT-based pattern detection for stubborn watermarks",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, padx=40)
+
+ # Right column
+ settings_frame = tk.Frame(right_column)
+ settings_frame.pack(fill=tk.X)
+
+ settings_frame.grid_columnconfigure(1, minsize=80)
+
+ settings = [
+ ("Min Image height (px):", self.webnovel_min_height_var),
+ ("Max Images per chapter:", self.max_images_per_chapter_var),
+ ("Chunk height:", self.image_chunk_height_var),
+ ("Chunk overlap (%):", self.image_chunk_overlap_var) # Add this new setting
+ ]
+
+ for row, (label, var) in enumerate(settings):
+ tk.Label(settings_frame, text=label).grid(row=row, column=0, sticky=tk.W, pady=3)
+ entry = tb.Entry(settings_frame, width=10, textvariable=var)
+ entry.grid(row=row, column=1, sticky=tk.W, pady=3)
+
+ # Add tooltip for the overlap setting
+ if "overlap" in label.lower():
+ tk.Label(settings_frame, text="1-10% recommended",
+ font=('TkDefaultFont', 8), fg='gray').grid(row=row, column=2, sticky=tk.W, padx=(5, 0))
+
+ # Buttons for prompts and compression
+ tb.Button(settings_frame, text="Image Chunk Prompt",
+ command=self.configure_image_chunk_prompt,
+ bootstyle="info-outline", width=20).grid(row=4, column=0, columnspan=2, sticky=tk.W, pady=(10, 0))
+
+ # Add Image Compression button
+ tb.Button(settings_frame, text="🗜️ Image Compression",
+ command=self.configure_image_compression,
+ bootstyle="info-outline", width=25).grid(row=5, column=0, columnspan=2, sticky=tk.W, pady=(5, 0))
+
+ # Add the toggle here in the right column with some spacing
+ tk.Frame(right_column, height=15).pack() # Add some spacing
+
+ tb.Checkbutton(right_column, text="Send tall image chunks in single API call (NOT RECOMMENDED)",
+ variable=self.single_api_image_chunks_var,
+ bootstyle="round-toggle").pack(anchor=tk.W)
+
+ tk.Label(right_column, text="All image chunks sent to 1 API call (Most AI models don't like this)",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, padx=20, pady=(0, 10))
+
+ tk.Label(right_column, text="💡 Supported models:\n"
+ "• Gemini 1.5 Pro/Flash, 2.0 Flash\n"
+ "• GPT-4V, GPT-4o, o4-mini",
+ font=('TkDefaultFont', 10), fg='#666', justify=tk.LEFT).pack(anchor=tk.W, pady=(10, 0))
+
+
+ # Set up the dependency logic
+ def toggle_watermark_options(*args):
+ if self.enable_watermark_removal_var.get():
+ # Enable both sub-options
+ self.save_cleaned_checkbox.config(state=tk.NORMAL)
+ self.advanced_watermark_checkbox.config(state=tk.NORMAL)
+ else:
+ # Disable both sub-options and turn them off
+ self.save_cleaned_checkbox.config(state=tk.DISABLED)
+ self.advanced_watermark_checkbox.config(state=tk.DISABLED)
+ self.save_cleaned_images_var.set(False)
+ self.advanced_watermark_removal_var.set(False)
+
+ # Bind the trace to the watermark removal variable
+ self.enable_watermark_removal_var.trace('w', toggle_watermark_options)
+
+ # Call once to set initial state
+ toggle_watermark_options()
+
+ def on_extraction_mode_change(self):
+ """Handle extraction mode changes and show/hide Enhanced options"""
+ if self.extraction_mode_var.get() == 'enhanced':
+ # Show enhanced options
+ if hasattr(self, 'enhanced_options_separator'):
+ self.enhanced_options_separator.pack(fill=tk.X, pady=(5, 5))
+ if hasattr(self, 'enhanced_options_frame'):
+ self.enhanced_options_frame.pack(fill=tk.X, padx=20)
+ else:
+ # Hide enhanced options
+ if hasattr(self, 'enhanced_options_separator'):
+ self.enhanced_options_separator.pack_forget()
+ if hasattr(self, 'enhanced_options_frame'):
+ self.enhanced_options_frame.pack_forget()
+
+ def _create_anti_duplicate_section(self, parent):
+ """Create comprehensive anti-duplicate parameter controls with tabs"""
+ # Anti-Duplicate Parameters section
+ ad_frame = tk.LabelFrame(parent, text="🎯 Anti-Duplicate Parameters", padx=15, pady=10)
+ ad_frame.grid(row=6, column=0, columnspan=2, sticky="ew", padx=20, pady=(0, 15))
+
+ # Description
+ desc_label = tk.Label(ad_frame,
+ text="Configure parameters to reduce duplicate translations across all AI providers.",
+ font=('TkDefaultFont', 9), fg='gray', wraplength=520)
+ desc_label.pack(anchor=tk.W, pady=(0, 10))
+
+ # Enable/Disable toggle
+ self.enable_anti_duplicate_var = tk.BooleanVar(value=self.config.get('enable_anti_duplicate', False))
+ enable_cb = tb.Checkbutton(ad_frame, text="Enable Anti-Duplicate Parameters",
+ variable=self.enable_anti_duplicate_var,
+ command=self._toggle_anti_duplicate_controls)
+ enable_cb.pack(anchor=tk.W, pady=(0, 10))
+
+ # Create notebook for organized parameters
+ self.anti_duplicate_notebook = ttk.Notebook(ad_frame)
+ self.anti_duplicate_notebook.pack(fill=tk.BOTH, expand=True, pady=5)
+
+ # Tab 1: Core Parameters
+ core_frame = tk.Frame(self.anti_duplicate_notebook)
+ self.anti_duplicate_notebook.add(core_frame, text="Core Parameters")
+
+ # Top-P (Nucleus Sampling)
+ top_p_frame = tk.Frame(core_frame)
+ top_p_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(top_p_frame, text="Top-P (Nucleus Sampling):", width=25, anchor='w').pack(side=tk.LEFT)
+ self.top_p_var = tk.DoubleVar(value=self.config.get('top_p', 1.0))
+ top_p_scale = tk.Scale(top_p_frame, from_=0.1, to=1.0, resolution=0.01,
+ orient=tk.HORIZONTAL, variable=self.top_p_var, length=200)
+ top_p_scale.pack(side=tk.LEFT, padx=5)
+ self.top_p_value_label = tk.Label(top_p_frame, text="", width=8)
+ self.top_p_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_top_p_label(*args):
+ val = self.top_p_var.get()
+ self.top_p_value_label.config(text=f"{val:.2f}")
+ self.top_p_var.trace('w', update_top_p_label)
+ update_top_p_label()
+
+ # Top-K (Vocabulary Limit)
+ top_k_frame = tk.Frame(core_frame)
+ top_k_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(top_k_frame, text="Top-K (Vocabulary Limit):", width=25, anchor='w').pack(side=tk.LEFT)
+ self.top_k_var = tk.IntVar(value=self.config.get('top_k', 0))
+ top_k_scale = tk.Scale(top_k_frame, from_=0, to=100, orient=tk.HORIZONTAL,
+ variable=self.top_k_var, length=200)
+ top_k_scale.pack(side=tk.LEFT, padx=5)
+ self.top_k_value_label = tk.Label(top_k_frame, text="", width=8)
+ self.top_k_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_top_k_label(*args):
+ val = self.top_k_var.get()
+ self.top_k_value_label.config(text=f"{val}" if val > 0 else "OFF")
+ self.top_k_var.trace('w', update_top_k_label)
+ update_top_k_label()
+
+ # Frequency Penalty
+ freq_penalty_frame = tk.Frame(core_frame)
+ freq_penalty_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(freq_penalty_frame, text="Frequency Penalty:", width=25, anchor='w').pack(side=tk.LEFT)
+ self.frequency_penalty_var = tk.DoubleVar(value=self.config.get('frequency_penalty', 0.0))
+ freq_scale = tk.Scale(freq_penalty_frame, from_=0.0, to=2.0, resolution=0.1,
+ orient=tk.HORIZONTAL, variable=self.frequency_penalty_var, length=200)
+ freq_scale.pack(side=tk.LEFT, padx=5)
+ self.freq_penalty_value_label = tk.Label(freq_penalty_frame, text="", width=8)
+ self.freq_penalty_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_freq_label(*args):
+ val = self.frequency_penalty_var.get()
+ self.freq_penalty_value_label.config(text=f"{val:.1f}" if val > 0 else "OFF")
+ self.frequency_penalty_var.trace('w', update_freq_label)
+ update_freq_label()
+
+ # Presence Penalty
+ pres_penalty_frame = tk.Frame(core_frame)
+ pres_penalty_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(pres_penalty_frame, text="Presence Penalty:", width=25, anchor='w').pack(side=tk.LEFT)
+ self.presence_penalty_var = tk.DoubleVar(value=self.config.get('presence_penalty', 0.0))
+ pres_scale = tk.Scale(pres_penalty_frame, from_=0.0, to=2.0, resolution=0.1,
+ orient=tk.HORIZONTAL, variable=self.presence_penalty_var, length=200)
+ pres_scale.pack(side=tk.LEFT, padx=5)
+ self.pres_penalty_value_label = tk.Label(pres_penalty_frame, text="", width=8)
+ self.pres_penalty_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_pres_label(*args):
+ val = self.presence_penalty_var.get()
+ self.pres_penalty_value_label.config(text=f"{val:.1f}" if val > 0 else "OFF")
+ self.presence_penalty_var.trace('w', update_pres_label)
+ update_pres_label()
+
+ # Tab 2: Advanced Parameters
+ advanced_frame = tk.Frame(self.anti_duplicate_notebook)
+ self.anti_duplicate_notebook.add(advanced_frame, text="Advanced")
+
+ # Repetition Penalty
+ rep_penalty_frame = tk.Frame(advanced_frame)
+ rep_penalty_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(rep_penalty_frame, text="Repetition Penalty:", width=25, anchor='w').pack(side=tk.LEFT)
+ self.repetition_penalty_var = tk.DoubleVar(value=self.config.get('repetition_penalty', 1.0))
+ rep_scale = tk.Scale(rep_penalty_frame, from_=1.0, to=2.0, resolution=0.05,
+ orient=tk.HORIZONTAL, variable=self.repetition_penalty_var, length=200)
+ rep_scale.pack(side=tk.LEFT, padx=5)
+ self.rep_penalty_value_label = tk.Label(rep_penalty_frame, text="", width=8)
+ self.rep_penalty_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_rep_label(*args):
+ val = self.repetition_penalty_var.get()
+ self.rep_penalty_value_label.config(text=f"{val:.2f}" if val > 1.0 else "OFF")
+ self.repetition_penalty_var.trace('w', update_rep_label)
+ update_rep_label()
+
+ # Candidate Count (Gemini)
+ candidate_frame = tk.Frame(advanced_frame)
+ candidate_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(candidate_frame, text="Candidate Count (Gemini):", width=25, anchor='w').pack(side=tk.LEFT)
+ self.candidate_count_var = tk.IntVar(value=self.config.get('candidate_count', 1))
+ candidate_scale = tk.Scale(candidate_frame, from_=1, to=4, orient=tk.HORIZONTAL,
+ variable=self.candidate_count_var, length=200)
+ candidate_scale.pack(side=tk.LEFT, padx=5)
+ self.candidate_value_label = tk.Label(candidate_frame, text="", width=8)
+ self.candidate_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_candidate_label(*args):
+ val = self.candidate_count_var.get()
+ self.candidate_value_label.config(text=f"{val}")
+ self.candidate_count_var.trace('w', update_candidate_label)
+ update_candidate_label()
+
+ # Tab 3: Stop Sequences
+ stop_frame = tk.Frame(self.anti_duplicate_notebook)
+ self.anti_duplicate_notebook.add(stop_frame, text="Stop Sequences")
+
+ # Custom Stop Sequences
+ stop_seq_frame = tk.Frame(stop_frame)
+ stop_seq_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(stop_seq_frame, text="Custom Stop Sequences:", width=25, anchor='w').pack(side=tk.LEFT)
+ self.custom_stop_sequences_var = tk.StringVar(value=self.config.get('custom_stop_sequences', ''))
+ stop_entry = tb.Entry(stop_seq_frame, textvariable=self.custom_stop_sequences_var, width=30)
+ stop_entry.pack(side=tk.LEFT, padx=5)
+ tk.Label(stop_seq_frame, text="(comma-separated)", font=('TkDefaultFont', 8), fg='gray').pack(side=tk.LEFT)
+
+ # Tab 4: Logit Bias (OpenAI)
+ bias_frame = tk.Frame(self.anti_duplicate_notebook)
+ self.anti_duplicate_notebook.add(bias_frame, text="Logit Bias")
+
+ # Logit Bias Enable
+ self.logit_bias_enabled_var = tk.BooleanVar(value=self.config.get('logit_bias_enabled', False))
+ bias_cb = tb.Checkbutton(bias_frame, text="Enable Logit Bias (OpenAI only)",
+ variable=self.logit_bias_enabled_var)
+ bias_cb.pack(anchor=tk.W, pady=5)
+
+ # Logit Bias Strength
+ bias_strength_frame = tk.Frame(bias_frame)
+ bias_strength_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(bias_strength_frame, text="Bias Strength:", width=25, anchor='w').pack(side=tk.LEFT)
+ self.logit_bias_strength_var = tk.DoubleVar(value=self.config.get('logit_bias_strength', -0.5))
+ bias_scale = tk.Scale(bias_strength_frame, from_=-2.0, to=2.0, resolution=0.1,
+ orient=tk.HORIZONTAL, variable=self.logit_bias_strength_var, length=200)
+ bias_scale.pack(side=tk.LEFT, padx=5)
+ self.bias_strength_value_label = tk.Label(bias_strength_frame, text="", width=8)
+ self.bias_strength_value_label.pack(side=tk.LEFT, padx=5)
+
+ def update_bias_strength_label(*args):
+ val = self.logit_bias_strength_var.get()
+ self.bias_strength_value_label.config(text=f"{val:.1f}")
+ self.logit_bias_strength_var.trace('w', update_bias_strength_label)
+ update_bias_strength_label()
+
+ # Preset bias targets
+ preset_frame = tk.Frame(bias_frame)
+ preset_frame.pack(fill=tk.X, pady=5)
+
+ tk.Label(preset_frame, text="Preset Bias Targets:", font=('TkDefaultFont', 9, 'bold')).pack(anchor=tk.W)
+
+ self.bias_common_words_var = tk.BooleanVar(value=self.config.get('bias_common_words', False))
+ tb.Checkbutton(preset_frame, text="Bias against common words (the, and, said)",
+ variable=self.bias_common_words_var).pack(anchor=tk.W)
+
+ self.bias_repetitive_phrases_var = tk.BooleanVar(value=self.config.get('bias_repetitive_phrases', False))
+ tb.Checkbutton(preset_frame, text="Bias against repetitive phrases",
+ variable=self.bias_repetitive_phrases_var).pack(anchor=tk.W)
+
+ # Provider compatibility info
+ compat_frame = tk.Frame(ad_frame)
+ compat_frame.pack(fill=tk.X, pady=(15, 0))
+
+ tk.Label(compat_frame, text="Parameter Compatibility:",
+ font=('TkDefaultFont', 9, 'bold')).pack(anchor=tk.W)
+
+ compat_text = tk.Label(compat_frame,
+ text="• Core: Most providers • Advanced: DeepSeek, Mistral, Groq • Logit Bias: OpenAI only",
+ font=('TkDefaultFont', 8), fg='gray', justify=tk.LEFT)
+ compat_text.pack(anchor=tk.W, pady=(5, 0))
+
+ # Reset button
+ reset_frame = tk.Frame(ad_frame)
+ reset_frame.pack(fill=tk.X, pady=(10, 0))
+
+ tb.Button(reset_frame, text="🔄 Reset to Defaults",
+ command=self._reset_anti_duplicate_defaults,
+ bootstyle="secondary", width=20).pack(side=tk.LEFT)
+
+ tk.Label(reset_frame, text="Reset all anti-duplicate parameters to default values",
+ font=('TkDefaultFont', 8), fg='gray').pack(side=tk.LEFT, padx=(10, 0))
+
+ # Store all tab frames for enable/disable
+ self.anti_duplicate_tabs = [core_frame, advanced_frame, stop_frame, bias_frame]
+
+ # Initial state
+ self._toggle_anti_duplicate_controls()
+
+ def _toggle_anti_duplicate_controls(self):
+ """Enable/disable anti-duplicate parameter controls"""
+ state = tk.NORMAL if self.enable_anti_duplicate_var.get() else tk.DISABLED
+
+ # Disable/enable the notebook itself
+ if hasattr(self, 'anti_duplicate_notebook'):
+ try:
+ self.anti_duplicate_notebook.config(state=state)
+ except tk.TclError:
+ pass
+
+ # Disable/enable all controls in tabs
+ if hasattr(self, 'anti_duplicate_tabs'):
+ for tab_frame in self.anti_duplicate_tabs:
+ for widget in tab_frame.winfo_children():
+ for child in widget.winfo_children():
+ if hasattr(child, 'config'):
+ try:
+ child.config(state=state)
+ except tk.TclError:
+ pass
+
+ def _toggle_http_tuning_controls(self):
+ """Enable/disable the HTTP timeout/pooling controls as a group"""
+ enabled = bool(self.enable_http_tuning_var.get()) if hasattr(self, 'enable_http_tuning_var') else False
+ state = 'normal' if enabled else 'disabled'
+ # Entries
+ for attr in ['connect_timeout_entry', 'read_timeout_entry', 'http_pool_connections_entry', 'http_pool_maxsize_entry']:
+ widget = getattr(self, attr, None)
+ if widget is not None:
+ try:
+ widget.configure(state=state)
+ except tk.TclError:
+ pass
+ # Retry-After checkbox
+ if hasattr(self, 'ignore_retry_after_checkbox') and self.ignore_retry_after_checkbox is not None:
+ try:
+ self.ignore_retry_after_checkbox.configure(state=state)
+ except tk.TclError:
+ pass
+
+ def _reset_anti_duplicate_defaults(self):
+ """Reset all anti-duplicate parameters to their default values"""
+ import tkinter.messagebox as messagebox
+
+ # Ask for confirmation
+ if not messagebox.askyesno("Reset Anti-Duplicate Parameters",
+ "Are you sure you want to reset all anti-duplicate parameters to their default values?"):
+ return
+
+ # Reset all variables to defaults
+ if hasattr(self, 'enable_anti_duplicate_var'):
+ self.enable_anti_duplicate_var.set(False)
+
+ if hasattr(self, 'top_p_var'):
+ self.top_p_var.set(1.0) # Default = no effect
+
+ if hasattr(self, 'top_k_var'):
+ self.top_k_var.set(0) # Default = disabled
+
+ if hasattr(self, 'frequency_penalty_var'):
+ self.frequency_penalty_var.set(0.0) # Default = no penalty
+
+ if hasattr(self, 'presence_penalty_var'):
+ self.presence_penalty_var.set(0.0) # Default = no penalty
+
+ if hasattr(self, 'repetition_penalty_var'):
+ self.repetition_penalty_var.set(1.0) # Default = no penalty
+
+ if hasattr(self, 'candidate_count_var'):
+ self.candidate_count_var.set(1) # Default = single response
+
+ if hasattr(self, 'custom_stop_sequences_var'):
+ self.custom_stop_sequences_var.set("") # Default = empty
+
+ if hasattr(self, 'logit_bias_enabled_var'):
+ self.logit_bias_enabled_var.set(False) # Default = disabled
+
+ if hasattr(self, 'logit_bias_strength_var'):
+ self.logit_bias_strength_var.set(-0.5) # Default strength
+
+ if hasattr(self, 'bias_common_words_var'):
+ self.bias_common_words_var.set(False) # Default = disabled
+
+ if hasattr(self, 'bias_repetitive_phrases_var'):
+ self.bias_repetitive_phrases_var.set(False) # Default = disabled
+
+ # Update enable/disable state
+ self._toggle_anti_duplicate_controls()
+
+ # Show success message
+ messagebox.showinfo("Reset Complete", "All anti-duplicate parameters have been reset to their default values.")
+
+ # Log the reset
+ if hasattr(self, 'append_log'):
+ self.append_log("🔄 Anti-duplicate parameters reset to defaults")
+
+ def _create_custom_api_endpoints_section(self, parent_frame):
+ """Create the Custom API Endpoints section"""
+ # Custom API Endpoints Section
+ endpoints_frame = tb.LabelFrame(parent_frame, text="Custom API Endpoints", padding=10)
+ endpoints_frame.grid(row=7, column=0, columnspan=2, sticky=tk.NSEW, padx=5, pady=5)
+
+ # Checkbox to enable/disable custom endpoint (MOVED TO TOP)
+ custom_endpoint_checkbox_frame = tb.Frame(endpoints_frame)
+ custom_endpoint_checkbox_frame.pack(fill=tk.X, padx=5, pady=(0, 5))
+
+ self.use_custom_endpoint_checkbox = tb.Checkbutton(
+ custom_endpoint_checkbox_frame,
+ text="Enable Custom OpenAI Endpoint",
+ variable=self.use_custom_openai_endpoint_var,
+ command=self.toggle_custom_endpoint_ui,
+ bootstyle="primary"
+ )
+ self.use_custom_endpoint_checkbox.pack(side=tk.LEFT)
+
+ # Main OpenAI Base URL
+ openai_url_frame = tb.Frame(endpoints_frame)
+ openai_url_frame.pack(fill=tk.X, padx=5, pady=5)
+
+ tb.Label(openai_url_frame, text="Override API Endpoint:").pack(side=tk.LEFT, padx=(0, 5))
+ self.openai_base_url_var = tk.StringVar(value=self.config.get('openai_base_url', ''))
+ self.openai_base_url_entry = tb.Entry(openai_url_frame, textvariable=self.openai_base_url_var, width=50)
+ self.openai_base_url_var.trace('w', self._check_azure_endpoint)
+ self.openai_base_url_entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(0, 5))
+
+ # Clear button
+ self.openai_clear_button = tb.Button(openai_url_frame, text="Clear",
+ command=lambda: self.openai_base_url_var.set(""),
+ bootstyle="secondary", width=8)
+ self.openai_clear_button.pack(side=tk.LEFT)
+
+ # Set initial state based on checkbox
+ if not self.use_custom_openai_endpoint_var.get():
+ self.openai_base_url_entry.configure(state='disabled')
+ self.openai_clear_button.configure(state='disabled')
+
+ # Help text for main field
+ help_text = tb.Label(endpoints_frame,
+ text="Enable checkbox to use custom endpoint. For Ollama: http://localhost:11434/v1",
+ font=('TkDefaultFont', 8), foreground='gray')
+ help_text.pack(anchor=tk.W, padx=5, pady=(0, 10))
+
+ # ADD AZURE VERSION FRAME HERE (initially hidden):
+ self.azure_version_frame = tb.Frame(endpoints_frame)
+ # Don't pack it yet - it will be shown/hidden dynamically
+
+ tb.Label(self.azure_version_frame, text="Azure API Version:").pack(side=tk.LEFT, padx=(5, 5))
+
+ # Update the existing azure_api_version_var with current config and add trace
+ self.azure_api_version_var.set(self.config.get('azure_api_version', '2024-08-01-preview'))
+ # Add trace to update env var immediately when changed
+ self.azure_api_version_var.trace('w', self._update_azure_api_version_env)
+ versions = [
+ '2025-01-01-preview', # Latest preview
+ '2024-12-01-preview',
+ '2024-10-01-preview',
+ '2024-08-01-preview', # Current default
+ '2024-06-01', # Stable release
+ '2024-05-01-preview',
+ '2024-04-01-preview',
+ '2024-02-01', # Older stable
+ '2023-12-01-preview',
+ '2023-10-01-preview',
+ '2023-05-15' # Legacy
+ ]
+ self.azure_version_combo = ttk.Combobox(
+ self.azure_version_frame,
+ textvariable=self.azure_api_version_var,
+ values=versions,
+ width=20,
+ state='normal'
+ )
+ self.azure_version_combo.pack(side=tk.LEFT, padx=(0, 5))
+
+ # Show More Fields button
+ self.show_more_endpoints = False
+ self.more_fields_button = tb.Button(endpoints_frame,
+ text="▼ Show More Fields",
+ command=self.toggle_more_endpoints,
+ bootstyle="link")
+ self.more_fields_button.pack(anchor=tk.W, padx=5, pady=5)
+
+ # Container for additional fields (initially hidden)
+ self.additional_endpoints_frame = tb.Frame(endpoints_frame)
+ # Don't pack it initially - it's hidden
+
+ # Inside the additional_endpoints_frame:
+ # Groq/Local Base URL
+ groq_url_frame = tb.Frame(self.additional_endpoints_frame)
+ groq_url_frame.pack(fill=tk.X, padx=5, pady=5)
+
+ tb.Label(groq_url_frame, text="Groq/Local Base URL:").pack(side=tk.LEFT, padx=(0, 5))
+ self.groq_base_url_var = tk.StringVar(value=self.config.get('groq_base_url', ''))
+ self.groq_base_url_entry = tb.Entry(groq_url_frame, textvariable=self.groq_base_url_var, width=50)
+ self.groq_base_url_entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(0, 5))
+ tb.Button(groq_url_frame, text="Clear",
+ command=lambda: self.groq_base_url_var.set(""),
+ bootstyle="secondary", width=8).pack(side=tk.LEFT)
+
+ groq_help = tb.Label(self.additional_endpoints_frame,
+ text="For vLLM: http://localhost:8000/v1 | For LM Studio: http://localhost:1234/v1",
+ font=('TkDefaultFont', 8), foreground='gray')
+ groq_help.pack(anchor=tk.W, padx=5, pady=(0, 5))
+
+ # Fireworks Base URL
+ fireworks_url_frame = tb.Frame(self.additional_endpoints_frame)
+ fireworks_url_frame.pack(fill=tk.X, padx=5, pady=5)
+
+ tb.Label(fireworks_url_frame, text="Fireworks Base URL:").pack(side=tk.LEFT, padx=(0, 5))
+ self.fireworks_base_url_var = tk.StringVar(value=self.config.get('fireworks_base_url', ''))
+ self.fireworks_base_url_entry = tb.Entry(fireworks_url_frame, textvariable=self.fireworks_base_url_var, width=50)
+ self.fireworks_base_url_entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(0, 5))
+ tb.Button(fireworks_url_frame, text="Clear",
+ command=lambda: self.fireworks_base_url_var.set(""),
+ bootstyle="secondary", width=8).pack(side=tk.LEFT)
+
+ # Info about multiple endpoints
+ info_frame = tb.Frame(self.additional_endpoints_frame)
+ info_frame.pack(fill=tk.X, padx=5, pady=10)
+
+ info_text = """💡 Advanced: Use multiple endpoints to run different local LLM servers simultaneously.
+ • Use model prefix 'groq/' to route through Groq endpoint
+ • Use model prefix 'fireworks/' to route through Fireworks endpoint
+ • Most users only need the main OpenAI endpoint above"""
+
+ tb.Label(info_frame, text=info_text,
+ font=('TkDefaultFont', 8), foreground='#0dcaf0', # Light blue color
+ wraplength=600, justify=tk.LEFT).pack(anchor=tk.W)
+
+ # Test Connection button (always visible)
+ test_button = tb.Button(endpoints_frame, text="Test Connection",
+ command=self.test_api_connections,
+ bootstyle="info")
+ test_button.pack(pady=10)
+
+ # Gemini OpenAI-Compatible Endpoint (inside additional_endpoints_frame)
+ gemini_frame = tb.Frame(self.additional_endpoints_frame)
+ gemini_frame.pack(fill=tk.X, padx=5, pady=5)
+
+ # Checkbox for enabling Gemini endpoint
+ self.gemini_checkbox = tb.Checkbutton(
+ gemini_frame,
+ text="Enable Gemini OpenAI-Compatible Endpoint",
+ variable=self.use_gemini_openai_endpoint_var,
+ command=self.toggle_gemini_endpoint, # Add the command
+ bootstyle="primary"
+ )
+ self.gemini_checkbox.pack(anchor=tk.W, pady=(5, 5))
+
+ # Gemini endpoint URL input
+ gemini_url_frame = tb.Frame(self.additional_endpoints_frame)
+ gemini_url_frame.pack(fill=tk.X, padx=5, pady=5)
+
+ tb.Label(gemini_url_frame, text="Gemini OpenAI Endpoint:").pack(side=tk.LEFT, padx=(0, 5))
+ self.gemini_endpoint_entry = tb.Entry(gemini_url_frame, textvariable=self.gemini_openai_endpoint_var, width=50)
+ self.gemini_endpoint_entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(0, 5))
+ self.gemini_clear_button = tb.Button(gemini_url_frame, text="Clear",
+ command=lambda: self.gemini_openai_endpoint_var.set(""),
+ bootstyle="secondary", width=8)
+ self.gemini_clear_button.pack(side=tk.LEFT)
+
+ # Help text
+ gemini_help = tb.Label(self.additional_endpoints_frame,
+ text="For Gemini rate limit optimization with proxy services (e.g., OpenRouter, LiteLLM)",
+ font=('TkDefaultFont', 8), foreground='gray')
+ gemini_help.pack(anchor=tk.W, padx=5, pady=(0, 5))
+
+ # Set initial state based on checkbox
+ if not self.use_gemini_openai_endpoint_var.get():
+ self.gemini_endpoint_entry.configure(state='disabled')
+ self.gemini_clear_button.configure(state='disabled')
+
+ def _check_azure_endpoint(self, *args):
+ """Check if endpoint is Azure and update UI"""
+ if not self.use_custom_openai_endpoint_var.get():
+ if hasattr(self, 'azure_version_frame'):
+ self.azure_version_frame.pack_forget()
+ return
+
+ url = self.openai_base_url_var.get()
+ if '.azure.com' in url or '.cognitiveservices' in url:
+ self.api_key_label.config(text="Azure Key:")
+
+ # Show Azure version frame in settings dialog
+ if hasattr(self, 'azure_version_frame'):
+ self.azure_version_frame.pack(before=self.more_fields_button, pady=(0, 10))
+ else:
+ self.api_key_label.config(text="OpenAI/Gemini/... API Key:")
+
+ # Hide Azure version frame
+ if hasattr(self, 'azure_version_frame'):
+ self.azure_version_frame.pack_forget()
+
+ def _update_azure_api_version_env(self, *args):
+ """Update the AZURE_API_VERSION environment variable when the setting changes"""
+ try:
+ api_version = self.azure_api_version_var.get()
+ if api_version:
+ os.environ['AZURE_API_VERSION'] = api_version
+ #print(f"✅ Updated Azure API Version in environment: {api_version}")
+ except Exception as e:
+ print(f"❌ Error updating Azure API Version environment variable: {e}")
+
+ def toggle_gemini_endpoint(self):
+ """Enable/disable Gemini endpoint entry based on toggle"""
+ if self.use_gemini_openai_endpoint_var.get():
+ self.gemini_endpoint_entry.configure(state='normal')
+ self.gemini_clear_button.configure(state='normal')
+ else:
+ self.gemini_endpoint_entry.configure(state='disabled')
+ self.gemini_clear_button.configure(state='disabled')
+
+ def toggle_custom_endpoint_ui(self):
+ """Enable/disable the OpenAI base URL entry and detect Azure"""
+ if self.use_custom_openai_endpoint_var.get():
+ self.openai_base_url_entry.configure(state='normal')
+ self.openai_clear_button.configure(state='normal')
+
+ # Check if it's Azure
+ url = self.openai_base_url_var.get()
+ if '.azure.com' in url or '.cognitiveservices' in url:
+ self.api_key_label.config(text="Azure Key:")
+ else:
+ self.api_key_label.config(text="OpenAI/Gemini/... API Key:")
+
+ print("✅ Custom OpenAI endpoint enabled")
+ else:
+ self.openai_base_url_entry.configure(state='disabled')
+ self.openai_clear_button.configure(state='disabled')
+ self.api_key_label.config(text="OpenAI/Gemini/... API Key:")
+ print("❌ Custom OpenAI endpoint disabled - using default OpenAI API")
+
+ def toggle_more_endpoints(self):
+ """Toggle visibility of additional endpoint fields"""
+ self.show_more_endpoints = not self.show_more_endpoints
+
+ if self.show_more_endpoints:
+ self.additional_endpoints_frame.pack(fill=tk.BOTH, expand=True, after=self.more_fields_button)
+ self.more_fields_button.configure(text="▲ Show Fewer Fields")
+ else:
+ self.additional_endpoints_frame.pack_forget()
+ self.more_fields_button.configure(text="▼ Show More Fields")
+
+ # Update dialog scrolling if needed
+ if hasattr(self, 'current_dialog') and self.current_dialog:
+ self.current_dialog.update_idletasks()
+ self.current_dialog.canvas.configure(scrollregion=self.current_dialog.canvas.bbox("all"))
+
+ def test_api_connections(self):
+ """Test all configured API connections"""
+ # Show immediate feedback
+ progress_dialog = tk.Toplevel(self.current_dialog if hasattr(self, 'current_dialog') else self.master)
+ progress_dialog.title("Testing Connections...")
+
+ # Set icon
+ try:
+ progress_dialog.iconbitmap("halgakos.ico")
+ except:
+ pass # Icon setting failed, continue without icon
+
+ # Center the dialog
+ progress_dialog.update_idletasks()
+ width = 300
+ height = 150
+ x = (progress_dialog.winfo_screenwidth() // 2) - (width // 2)
+ y = (progress_dialog.winfo_screenheight() // 2) - (height // 2)
+ progress_dialog.geometry(f"{width}x{height}+{x}+{y}")
+
+ # Add progress message
+ progress_label = tb.Label(progress_dialog, text="Testing API connections...\nPlease wait...",
+ font=('TkDefaultFont', 10))
+ progress_label.pack(pady=50)
+
+ # Force update to show dialog immediately
+ progress_dialog.update()
+
+ try:
+ # Ensure we have the openai module
+ import openai
+ except ImportError:
+ progress_dialog.destroy()
+ messagebox.showerror("Error", "OpenAI library not installed")
+ return
+
+ # Get API key from the main GUI
+ api_key = self.api_key_entry.get() if hasattr(self, 'api_key_entry') else self.config.get('api_key', '')
+ if not api_key:
+ api_key = "sk-dummy-key" # For local models
+
+ # Collect all configured endpoints
+ endpoints_to_test = []
+
+ # OpenAI endpoint - only test if checkbox is enabled
+ if self.use_custom_openai_endpoint_var.get():
+ openai_url = self.openai_base_url_var.get()
+ if openai_url:
+ # Check if it's Azure
+ if '.azure.com' in openai_url or '.cognitiveservices' in openai_url:
+ # Azure endpoint
+ deployment = self.model_var.get() if hasattr(self, 'model_var') else "gpt-35-turbo"
+ api_version = self.azure_api_version_var.get() if hasattr(self, 'azure_api_version_var') else "2024-08-01-preview"
+
+ # Format Azure URL
+ if '/openai/deployments/' not in openai_url:
+ azure_url = f"{openai_url.rstrip('/')}/openai/deployments/{deployment}/chat/completions?api-version={api_version}"
+ else:
+ azure_url = openai_url
+
+ endpoints_to_test.append(("Azure OpenAI", azure_url, deployment, "azure"))
+ else:
+ # Regular custom endpoint
+ endpoints_to_test.append(("OpenAI (Custom)", openai_url, self.model_var.get() if hasattr(self, 'model_var') else "gpt-3.5-turbo"))
+ else:
+ # Use default OpenAI endpoint if checkbox is on but no custom URL provided
+ endpoints_to_test.append(("OpenAI (Default)", "https://api.openai.com/v1", self.model_var.get() if hasattr(self, 'model_var') else "gpt-3.5-turbo"))
+
+ # Groq endpoint
+ if hasattr(self, 'groq_base_url_var'):
+ groq_url = self.groq_base_url_var.get()
+ if groq_url:
+ # For Groq, we need a groq-prefixed model
+ current_model = self.model_var.get() if hasattr(self, 'model_var') else "llama-3-70b"
+ groq_model = current_model if current_model.startswith('groq/') else current_model.replace('groq/', '')
+ endpoints_to_test.append(("Groq/Local", groq_url, groq_model))
+
+ # Fireworks endpoint
+ if hasattr(self, 'fireworks_base_url_var'):
+ fireworks_url = self.fireworks_base_url_var.get()
+ if fireworks_url:
+ # For Fireworks, we need the accounts/ prefix
+ current_model = self.model_var.get() if hasattr(self, 'model_var') else "llama-v3-70b-instruct"
+ fw_model = current_model if current_model.startswith('accounts/') else f"accounts/fireworks/models/{current_model.replace('fireworks/', '')}"
+ endpoints_to_test.append(("Fireworks", fireworks_url, fw_model))
+
+ # Gemini OpenAI-Compatible endpoint
+ if hasattr(self, 'use_gemini_openai_endpoint_var') and self.use_gemini_openai_endpoint_var.get():
+ gemini_url = self.gemini_openai_endpoint_var.get()
+ if gemini_url:
+ # Ensure the endpoint ends with /openai/ for compatibility
+ if not gemini_url.endswith('/openai/'):
+ if gemini_url.endswith('/'):
+ gemini_url = gemini_url + 'openai/'
+ else:
+ gemini_url = gemini_url + '/openai/'
+
+ # For Gemini OpenAI-compatible endpoints, use the current model or a suitable default
+ current_model = self.model_var.get() if hasattr(self, 'model_var') else "gemini-2.0-flash-exp"
+ # Remove any 'gemini/' prefix for the OpenAI-compatible endpoint
+ gemini_model = current_model.replace('gemini/', '') if current_model.startswith('gemini/') else current_model
+ endpoints_to_test.append(("Gemini (OpenAI-Compatible)", gemini_url, gemini_model))
+
+ if not endpoints_to_test:
+ messagebox.showinfo("Info", "No custom endpoints configured. Using default API endpoints.")
+ return
+
+ # Test each endpoint
+ # Test each endpoint
+ results = []
+ for endpoint_info in endpoints_to_test:
+ if len(endpoint_info) == 4 and endpoint_info[3] == "azure":
+ # Azure endpoint
+ name, base_url, model, endpoint_type = endpoint_info
+ try:
+ # Azure uses different headers
+ import requests
+ headers = {
+ "api-key": api_key,
+ "Content-Type": "application/json"
+ }
+
+ response = requests.post(
+ base_url,
+ headers=headers,
+ json={
+ "messages": [{"role": "user", "content": "Hi"}],
+ "max_tokens": 5
+ },
+ timeout=5.0
+ )
+
+ if response.status_code == 200:
+ results.append(f"✅ {name}: Connected successfully! (Deployment: {model})")
+ else:
+ results.append(f"❌ {name}: {response.status_code} - {response.text[:100]}")
+
+ except Exception as e:
+ error_msg = str(e)[:100]
+ results.append(f"❌ {name}: {error_msg}")
+ else:
+ # Regular OpenAI-compatible endpoint
+ name, base_url, model = endpoint_info[:3]
+ try:
+ # Create client for this endpoint
+ test_client = openai.OpenAI(
+ api_key=api_key,
+ base_url=base_url,
+ timeout=5.0 # Short timeout for testing
+ )
+
+ # Try a minimal completion
+ response = test_client.chat.completions.create(
+ model=model,
+ messages=[{"role": "user", "content": "Hi"}],
+ max_tokens=5
+ )
+
+ results.append(f"✅ {name}: Connected successfully! (Model: {model})")
+ except Exception as e:
+ error_msg = str(e)
+ # Simplify common error messages
+ if "404" in error_msg:
+ error_msg = "404 - Endpoint not found. Check URL and model name."
+ elif "401" in error_msg or "403" in error_msg:
+ error_msg = "Authentication failed. Check API key."
+ elif "model" in error_msg.lower() and "not found" in error_msg.lower():
+ error_msg = f"Model '{model}' not found at this endpoint."
+
+ results.append(f"❌ {name}: {error_msg}")
+
+ # Show results
+ result_message = "Connection Test Results:\n\n" + "\n\n".join(results)
+
+ # Close progress dialog
+ progress_dialog.destroy()
+
+ # Determine if all succeeded
+ all_success = all("✅" in r for r in results)
+
+ if all_success:
+ messagebox.showinfo("Success", result_message)
+ else:
+ messagebox.showwarning("Test Results", result_message)
+
+ def _create_settings_buttons(self, parent, dialog, canvas):
+ """Create save and close buttons for settings dialog"""
+ button_frame = tk.Frame(parent)
+ button_frame.grid(row=3, column=0, columnspan=2, pady=(10, 10))
+
+ button_container = tk.Frame(button_frame)
+ button_container.pack(expand=True)
+
+ def save_and_close():
+ try:
+ def safe_int(value, default):
+ try: return int(value)
+ except (ValueError, TypeError): return default
+
+ def safe_float(value, default):
+ try: return float(value)
+ except (ValueError, TypeError): return default
+
+ # Save all settings
+ self.config.update({
+ 'use_rolling_summary': self.rolling_summary_var.get(),
+ 'summary_role': self.summary_role_var.get(),
+ 'attach_css_to_chapters': self.attach_css_to_chapters_var.get(),
+ 'retain_source_extension': self.retain_source_extension_var.get(),
+ 'rolling_summary_exchanges': safe_int(self.rolling_summary_exchanges_var.get(), 5),
+ 'rolling_summary_mode': self.rolling_summary_mode_var.get(),
+ 'rolling_summary_max_entries': safe_int(self.rolling_summary_max_entries_var.get(), 10),
+ 'retry_truncated': self.retry_truncated_var.get(),
+ 'max_retry_tokens': safe_int(self.max_retry_tokens_var.get(), 16384),
+ 'retry_duplicate_bodies': self.retry_duplicate_var.get(),
+ 'duplicate_lookback_chapters': safe_int(self.duplicate_lookback_var.get(), 5),
+ 'retry_timeout': self.retry_timeout_var.get(),
+ # New QA-related config
+ 'qa_auto_search_output': bool(self.qa_auto_search_output_var.get()) if hasattr(self, 'qa_auto_search_output_var') else self.config.get('qa_auto_search_output', True),
+ 'scan_phase_enabled': bool(self.scan_phase_enabled_var.get()) if hasattr(self, 'scan_phase_enabled_var') else self.config.get('scan_phase_enabled', False),
+ 'scan_phase_mode': self.scan_phase_mode_var.get() if hasattr(self, 'scan_phase_mode_var') else self.config.get('scan_phase_mode', 'quick-scan'),
+ 'chunk_timeout': safe_int(self.chunk_timeout_var.get(), 900),
+ 'enable_http_tuning': bool(self.enable_http_tuning_var.get()) if hasattr(self, 'enable_http_tuning_var') else False,
+ # New network/HTTP controls
+ 'connect_timeout': safe_float(self.connect_timeout_var.get() if hasattr(self, 'connect_timeout_var') else os.environ.get('CONNECT_TIMEOUT', 10), 10.0),
+ 'read_timeout': safe_float(self.read_timeout_var.get() if hasattr(self, 'read_timeout_var') else os.environ.get('READ_TIMEOUT', os.environ.get('CHUNK_TIMEOUT', 180)), 180.0),
+ 'http_pool_connections': safe_int(self.http_pool_connections_var.get() if hasattr(self, 'http_pool_connections_var') else os.environ.get('HTTP_POOL_CONNECTIONS', 20), 20),
+ 'http_pool_maxsize': safe_int(self.http_pool_maxsize_var.get() if hasattr(self, 'http_pool_maxsize_var') else os.environ.get('HTTP_POOL_MAXSIZE', 50), 50),
+ 'ignore_retry_after': bool(self.ignore_retry_after_var.get()) if hasattr(self, 'ignore_retry_after_var') else (str(os.environ.get('IGNORE_RETRY_AFTER', '0')) == '1'),
+ 'max_retries': safe_int(self.max_retries_var.get() if hasattr(self, 'max_retries_var') else os.environ.get('MAX_RETRIES', 7), 7),
+ 'indefinite_rate_limit_retry': self.indefinite_rate_limit_retry_var.get(),
+
+ 'reinforcement_frequency': safe_int(self.reinforcement_freq_var.get(), 10),
+ 'translate_book_title': self.translate_book_title_var.get(),
+ 'book_title_prompt': getattr(self, 'book_title_prompt',
+ "Translate this book title to English while retaining any acronyms:"),
+ 'emergency_paragraph_restore': self.emergency_restore_var.get(),
+ 'disable_chapter_merging': self.disable_chapter_merging_var.get(),
+ 'disable_epub_gallery': self.disable_epub_gallery_var.get(),
+ 'disable_automatic_cover_creation': self.disable_automatic_cover_creation_var.get(),
+ 'translate_cover_html': self.translate_cover_html_var.get(),
+ 'disable_zero_detection': self.disable_zero_detection_var.get(),
+ 'enable_image_translation': self.enable_image_translation_var.get(),
+ 'process_webnovel_images': self.process_webnovel_images_var.get(),
+ 'hide_image_translation_label': self.hide_image_translation_label_var.get(),
+ 'duplicate_detection_mode': self.duplicate_detection_mode_var.get(),
+ 'chapter_number_offset': safe_int(self.chapter_number_offset_var.get(), 0),
+ 'enable_decimal_chapters': self.enable_decimal_chapters_var.get(),
+ 'use_header_as_output': self.use_header_as_output_var.get(),
+ 'disable_gemini_safety': self.disable_gemini_safety_var.get(),
+ 'openrouter_use_http_only': self.openrouter_http_only_var.get(),
+ 'openrouter_accept_identity': self.openrouter_accept_identity_var.get(),
+ 'auto_update_check': self.auto_update_check_var.get(),
+ 'force_ncx_only': self.force_ncx_only_var.get(),
+ 'single_api_image_chunks': self.single_api_image_chunks_var.get(),
+ 'enable_gemini_thinking': self.enable_gemini_thinking_var.get(),
+ 'thinking_budget': int(self.thinking_budget_var.get()) if self.thinking_budget_var.get().lstrip('-').isdigit() else 0,
+ 'enable_gpt_thinking': self.enable_gpt_thinking_var.get(),
+ 'gpt_reasoning_tokens': safe_int(self.gpt_reasoning_tokens_var.get(), 0),
+ 'gpt_effort': self.gpt_effort_var.get(),
+ 'openai_base_url': self.openai_base_url_var.get(),
+ 'groq_base_url': self.groq_base_url_var.get() if hasattr(self, 'groq_base_url_var') else '',
+ 'fireworks_base_url': self.fireworks_base_url_var.get() if hasattr(self, 'fireworks_base_url_var') else '',
+ 'use_custom_openai_endpoint': self.use_custom_openai_endpoint_var.get(),
+ 'text_extraction_method': self.text_extraction_method_var.get() if hasattr(self, 'text_extraction_method_var') else 'standard',
+ 'file_filtering_level': self.file_filtering_level_var.get() if hasattr(self, 'file_filtering_level_var') else 'smart',
+ 'extraction_mode': 'enhanced' if self.text_extraction_method_var.get() == 'enhanced' else self.file_filtering_level_var.get(),
+ 'enhanced_filtering': self.file_filtering_level_var.get() if self.text_extraction_method_var.get() == 'enhanced' else 'smart',
+ 'use_gemini_openai_endpoint': self.use_gemini_openai_endpoint_var.get(),
+ 'gemini_openai_endpoint': self.gemini_openai_endpoint_var.get(),
+ 'image_chunk_overlap': safe_float(self.image_chunk_overlap_var.get(), 1.0),
+ 'azure_api_version': self.azure_api_version_var.get() if hasattr(self, 'azure_api_version_var') else '2024-08-01-preview',
+ # Preserve use_fallback_keys from config if it was set by multi API key manager
+ 'use_fallback_keys': self.config.get('use_fallback_keys', self.use_fallback_keys_var.get()),
+
+
+ # ALL Anti-duplicate parameters (moved below other settings)
+ 'enable_anti_duplicate': getattr(self, 'enable_anti_duplicate_var', type('', (), {'get': lambda: False})).get(),
+ 'top_p': float(getattr(self, 'top_p_var', type('', (), {'get': lambda: 1.0})).get()),
+ 'top_k': safe_int(getattr(self, 'top_k_var', type('', (), {'get': lambda: 0})).get(), 0),
+ 'frequency_penalty': float(getattr(self, 'frequency_penalty_var', type('', (), {'get': lambda: 0.0})).get()),
+ 'presence_penalty': float(getattr(self, 'presence_penalty_var', type('', (), {'get': lambda: 0.0})).get()),
+ 'repetition_penalty': float(getattr(self, 'repetition_penalty_var', type('', (), {'get': lambda: 1.0})).get()),
+ 'candidate_count': safe_int(getattr(self, 'candidate_count_var', type('', (), {'get': lambda: 1})).get(), 1),
+ 'custom_stop_sequences': getattr(self, 'custom_stop_sequences_var', type('', (), {'get': lambda: ''})).get(),
+ 'logit_bias_enabled': getattr(self, 'logit_bias_enabled_var', type('', (), {'get': lambda: False})).get(),
+ 'logit_bias_strength': float(getattr(self, 'logit_bias_strength_var', type('', (), {'get': lambda: -0.5})).get()),
+ 'bias_common_words': getattr(self, 'bias_common_words_var', type('', (), {'get': lambda: False})).get(),
+ 'bias_repetitive_phrases': getattr(self, 'bias_repetitive_phrases_var', type('', (), {'get': lambda: False})).get(),
+ 'enable_parallel_extraction': self.enable_parallel_extraction_var.get(),
+ 'extraction_workers': safe_int(self.extraction_workers_var.get(), 2),
+
+ # Batch header translation settings
+ 'batch_translate_headers': self.batch_translate_headers_var.get(),
+ 'headers_per_batch': safe_int(self.headers_per_batch_var.get(), 500),
+ 'update_html_headers': self.update_html_headers_var.get(),
+ 'save_header_translations': self.save_header_translations_var.get(),
+ 'ignore_header': self.ignore_header_var.get(),
+ 'ignore_title': self.ignore_title_var.get(),
+
+ })
+
+ # Validate numeric fields
+ numeric_fields = [
+ ('webnovel_min_height', self.webnovel_min_height_var, 1000),
+ ('max_images_per_chapter', self.max_images_per_chapter_var, 1),
+ ('image_chunk_height', self.image_chunk_height_var, 1500)
+ ]
+
+ for field_name, var, default in numeric_fields:
+ value = var.get().strip()
+ if value and not value.isdigit():
+ messagebox.showerror("Invalid Input",
+ f"Please enter a valid number for {field_name.replace('_', ' ').title()}")
+ return
+
+ for field_name, var, default in numeric_fields:
+ self.config[field_name] = safe_int(var.get(), default)
+
+ # Update environment variables
+ env_updates = {
+ "USE_ROLLING_SUMMARY": "1" if self.rolling_summary_var.get() else "0",
+ "SUMMARY_ROLE": self.summary_role_var.get(),
+ "ATTACH_CSS_TO_CHAPTERS": "1" if self.attach_css_to_chapters_var.get() else "0",
+ "RETAIN_SOURCE_EXTENSION": "1" if self.retain_source_extension_var.get() else "0",
+ "ROLLING_SUMMARY_EXCHANGES": str(self.config['rolling_summary_exchanges']),
+ "ROLLING_SUMMARY_MODE": self.rolling_summary_mode_var.get(),
+ "ROLLING_SUMMARY_SYSTEM_PROMPT": self.rolling_summary_system_prompt,
+ "ROLLING_SUMMARY_USER_PROMPT": self.rolling_summary_user_prompt,
+ "ROLLING_SUMMARY_MAX_ENTRIES": str(self.config.get('rolling_summary_max_entries', 10)),
+ "RETRY_TRUNCATED": "1" if self.retry_truncated_var.get() else "0",
+ "MAX_RETRY_TOKENS": str(self.config['max_retry_tokens']),
+ "RETRY_DUPLICATE_BODIES": "1" if self.retry_duplicate_var.get() else "0",
+ "DUPLICATE_LOOKBACK_CHAPTERS": str(self.config['duplicate_lookback_chapters']),
+ "RETRY_TIMEOUT": "1" if self.retry_timeout_var.get() else "0",
+ "CHUNK_TIMEOUT": str(self.config['chunk_timeout']),
+ # New network/HTTP controls
+ "ENABLE_HTTP_TUNING": '1' if self.config.get('enable_http_tuning', False) else '0',
+ "CONNECT_TIMEOUT": str(self.config['connect_timeout']),
+ "READ_TIMEOUT": str(self.config['read_timeout']),
+ "HTTP_POOL_CONNECTIONS": str(self.config['http_pool_connections']),
+ "HTTP_POOL_MAXSIZE": str(self.config['http_pool_maxsize']),
+ "IGNORE_RETRY_AFTER": '1' if self.config.get('ignore_retry_after', False) else '0',
+ "MAX_RETRIES": str(self.config['max_retries']),
+ # Persist auto-search preference for child dialogs
+ "QA_AUTO_SEARCH_OUTPUT": '1' if self.config.get('qa_auto_search_output', True) else '0',
+ "INDEFINITE_RATE_LIMIT_RETRY": "1" if self.indefinite_rate_limit_retry_var.get() else "0",
+ "REINFORCEMENT_FREQUENCY": str(self.config['reinforcement_frequency']),
+ "TRANSLATE_BOOK_TITLE": "1" if self.translate_book_title_var.get() else "0",
+ "BOOK_TITLE_PROMPT": self.book_title_prompt,
+ "EMERGENCY_PARAGRAPH_RESTORE": "1" if self.emergency_restore_var.get() else "0",
+ 'DISABLE_CHAPTER_MERGING': '1' if self.disable_chapter_merging_var.get() else '0',
+ "ENABLE_IMAGE_TRANSLATION": "1" if self.enable_image_translation_var.get() else "0",
+ "PROCESS_WEBNOVEL_IMAGES": "1" if self.process_webnovel_images_var.get() else "0",
+ "WEBNOVEL_MIN_HEIGHT": str(self.config['webnovel_min_height']),
+ "MAX_IMAGES_PER_CHAPTER": str(self.config['max_images_per_chapter']),
+ "IMAGE_CHUNK_HEIGHT": str(self.config['image_chunk_height']),
+ "HIDE_IMAGE_TRANSLATION_LABEL": "1" if self.hide_image_translation_label_var.get() else "0",
+ "DISABLE_EPUB_GALLERY": "1" if self.disable_epub_gallery_var.get() else "0",
+ "DISABLE_AUTOMATIC_COVER_CREATION": "1" if getattr(self, 'disable_automatic_cover_creation_var', tk.BooleanVar(value=False)).get() else "0",
+ "TRANSLATE_COVER_HTML": "1" if getattr(self, 'translate_cover_html_var', tk.BooleanVar(value=False)).get() else "0",
+ "DISABLE_ZERO_DETECTION": "1" if self.disable_zero_detection_var.get() else "0",
+ "DUPLICATE_DETECTION_MODE": self.duplicate_detection_mode_var.get(),
+ "ENABLE_DECIMAL_CHAPTERS": "1" if self.enable_decimal_chapters_var.get() else "0",
+ 'ENABLE_WATERMARK_REMOVAL': "1" if self.enable_watermark_removal_var.get() else "0",
+ 'SAVE_CLEANED_IMAGES': "1" if self.save_cleaned_images_var.get() else "0",
+ 'TRANSLATION_CHUNK_PROMPT': str(getattr(self, 'translation_chunk_prompt', '')), # FIXED: Convert to string
+ 'IMAGE_CHUNK_PROMPT': str(getattr(self, 'image_chunk_prompt', '')), # FIXED: Convert to string
+ "DISABLE_GEMINI_SAFETY": str(self.config.get('disable_gemini_safety', False)).lower(),
+ "OPENROUTER_USE_HTTP_ONLY": '1' if self.openrouter_http_only_var.get() else '0',
+ "OPENROUTER_ACCEPT_IDENTITY": '1' if self.openrouter_accept_identity_var.get() else '0',
+ 'auto_update_check': str(self.auto_update_check_var.get()),
+ 'FORCE_NCX_ONLY': '1' if self.force_ncx_only_var.get() else '0',
+ 'SINGLE_API_IMAGE_CHUNKS': "1" if self.single_api_image_chunks_var.get() else "0",
+ 'ENABLE_GEMINI_THINKING': "1" if self.enable_gemini_thinking_var.get() else "0",
+ 'THINKING_BUDGET': self.thinking_budget_var.get() if self.enable_gemini_thinking_var.get() else '0',
+ 'ENABLE_GPT_THINKING': '1' if self.enable_gpt_thinking_var.get() else '0',
+' GPT_REASONING_TOKENS': self.gpt_reasoning_tokens_var.get() if self.enable_gpt_thinking_var.get() else '',
+ 'GPT_EFFORT': self.gpt_effort_var.get(),
+ 'OPENROUTER_EXCLUDE': '1',
+ # Custom API endpoints
+ 'OPENAI_CUSTOM_BASE_URL': self.openai_base_url_var.get() if self.openai_base_url_var.get() else '',
+ 'GROQ_API_URL': self.groq_base_url_var.get() if hasattr(self, 'groq_base_url_var') and self.groq_base_url_var.get() else '',
+ 'FIREWORKS_API_URL': self.fireworks_base_url_var.get() if hasattr(self, 'fireworks_base_url_var') and self.fireworks_base_url_var.get() else '',
+ 'USE_CUSTOM_OPENAI_ENDPOINT': '1' if self.use_custom_openai_endpoint_var.get() else '0',
+ 'USE_GEMINI_OPENAI_ENDPOINT': '1' if self.use_gemini_openai_endpoint_var.get() else '0',
+ 'GEMINI_OPENAI_ENDPOINT': self.gemini_openai_endpoint_var.get() if self.gemini_openai_endpoint_var.get() else '',
+ # Image compression settings
+ 'ENABLE_IMAGE_COMPRESSION': "1" if self.config.get('enable_image_compression', False) else "0",
+ 'AUTO_COMPRESS_ENABLED': "1" if self.config.get('auto_compress_enabled', True) else "0",
+ 'TARGET_IMAGE_TOKENS': str(self.config.get('target_image_tokens', 1000)),
+ 'IMAGE_COMPRESSION_FORMAT': self.config.get('image_compression_format', 'auto'),
+ 'WEBP_QUALITY': str(self.config.get('webp_quality', 85)),
+ 'JPEG_QUALITY': str(self.config.get('jpeg_quality', 85)),
+ 'PNG_COMPRESSION': str(self.config.get('png_compression', 6)),
+ 'MAX_IMAGE_DIMENSION': str(self.config.get('max_image_dimension', 2048)),
+ 'MAX_IMAGE_SIZE_MB': str(self.config.get('max_image_size_mb', 10)),
+ 'PRESERVE_TRANSPARENCY': "1" if self.config.get('preserve_transparency', True) else "0",
+ 'OPTIMIZE_FOR_OCR': "1" if self.config.get('optimize_for_ocr', True) else "0",
+ 'PROGRESSIVE_ENCODING': "1" if self.config.get('progressive_encoding', True) else "0",
+ 'SAVE_COMPRESSED_IMAGES': "1" if self.config.get('save_compressed_images', False) else "0",
+ 'USE_FALLBACK_KEYS': '1' if self.config.get('use_fallback_keys', False) else '0',
+ 'FALLBACK_KEYS': json.dumps(self.config.get('fallback_keys', [])),
+ 'IMAGE_CHUNK_OVERLAP_PERCENT': self.image_chunk_overlap_var.get(),
+
+ # Metadata and batch header settings
+ 'TRANSLATE_METADATA_FIELDS': json.dumps(self.translate_metadata_fields),
+ 'METADATA_TRANSLATION_MODE': self.config.get('metadata_translation_mode', 'together'),
+ 'BATCH_TRANSLATE_HEADERS': "1" if self.batch_translate_headers_var.get() else "0",
+ 'HEADERS_PER_BATCH': str(self.config.get('headers_per_batch', 800)),
+ 'UPDATE_HTML_HEADERS': "1" if self.update_html_headers_var.get() else "0",
+ 'SAVE_HEADER_TRANSLATIONS': "1" if self.save_header_translations_var.get() else "0",
+ 'IGNORE_HEADER': "1" if self.ignore_header_var.get() else "0",
+ 'IGNORE_TITLE': "1" if self.ignore_title_var.get() else "0",
+ # EXTRACTION_MODE:
+ "TEXT_EXTRACTION_METHOD": self.text_extraction_method_var.get() if hasattr(self, 'text_extraction_method_var') else 'standard',
+ "FILE_FILTERING_LEVEL": self.file_filtering_level_var.get() if hasattr(self, 'file_filtering_level_var') else 'smart',
+ "EXTRACTION_MODE": 'enhanced' if self.text_extraction_method_var.get() == 'enhanced' else self.file_filtering_level_var.get(),
+ "ENHANCED_FILTERING": self.file_filtering_level_var.get() if self.text_extraction_method_var.get() == 'enhanced' else 'smart',
+
+ # ALL Anti-duplicate environment variables (moved below other settings)
+ 'ENABLE_ANTI_DUPLICATE': '1' if hasattr(self, 'enable_anti_duplicate_var') and self.enable_anti_duplicate_var.get() else '0',
+ 'TOP_P': str(self.top_p_var.get()) if hasattr(self, 'top_p_var') else '1.0',
+ 'TOP_K': str(self.top_k_var.get()) if hasattr(self, 'top_k_var') else '0',
+ 'FREQUENCY_PENALTY': str(self.frequency_penalty_var.get()) if hasattr(self, 'frequency_penalty_var') else '0.0',
+ 'PRESENCE_PENALTY': str(self.presence_penalty_var.get()) if hasattr(self, 'presence_penalty_var') else '0.0',
+ 'REPETITION_PENALTY': str(self.repetition_penalty_var.get()) if hasattr(self, 'repetition_penalty_var') else '1.0',
+ 'CANDIDATE_COUNT': str(self.candidate_count_var.get()) if hasattr(self, 'candidate_count_var') else '1',
+ 'CUSTOM_STOP_SEQUENCES': self.custom_stop_sequences_var.get() if hasattr(self, 'custom_stop_sequences_var') else '',
+ 'LOGIT_BIAS_ENABLED': '1' if hasattr(self, 'logit_bias_enabled_var') and self.logit_bias_enabled_var.get() else '0',
+ 'LOGIT_BIAS_STRENGTH': str(self.logit_bias_strength_var.get()) if hasattr(self, 'logit_bias_strength_var') else '-0.5',
+ 'BIAS_COMMON_WORDS': '1' if hasattr(self, 'bias_common_words_var') and self.bias_common_words_var.get() else '0',
+ 'BIAS_REPETITIVE_PHRASES': '1' if hasattr(self, 'bias_repetitive_phrases_var') and self.bias_repetitive_phrases_var.get() else '0',
+ 'EXTRACTION_WORKERS': str(self.extraction_workers_var.get()) if self.enable_parallel_extraction_var.get() else '1',
+ 'AZURE_API_VERSION': str(self.config.get('azure_api_version', '2024-08-01-preview')),
+ }
+ os.environ.update(env_updates)
+
+ with open(CONFIG_FILE, 'w', encoding='utf-8') as f:
+ json.dump(self.config, f, ensure_ascii=False, indent=2)
+
+ self.append_log("✅ Other Settings saved successfully")
+ dialog.destroy()
+
+ except Exception as e:
+ print(f"❌ Failed to save Other Settings: {e}")
+ messagebox.showerror("Error", f"Failed to save settings: {e}")
+
+ tb.Button(button_container, text="💾 Save Settings", command=save_and_close,
+ bootstyle="success", width=20).pack(side=tk.LEFT, padx=5)
+
+ tb.Button(button_container, text="❌ Cancel", command=lambda: [dialog._cleanup_scrolling(), dialog.destroy()],
+ bootstyle="secondary", width=20).pack(side=tk.LEFT, padx=5)
+
+ def delete_translated_headers_file(self):
+ """Delete the translated_headers.txt file from the output directory for all selected EPUBs"""
+ try:
+ # Get all selected EPUB files using the same logic as QA scanner
+ epub_files_to_process = []
+
+ # First check if current selection actually contains EPUBs
+ if hasattr(self, 'selected_files') and self.selected_files:
+ current_epub_files = [f for f in self.selected_files if f.lower().endswith('.epub')]
+ if current_epub_files:
+ epub_files_to_process = current_epub_files
+ self.append_log(f"📚 Found {len(epub_files_to_process)} EPUB files in current selection")
+
+ # If no EPUBs in selection, try single EPUB methods
+ if not epub_files_to_process:
+ epub_path = self.get_current_epub_path()
+ if not epub_path:
+ entry_path = self.entry_epub.get().strip()
+ if entry_path and entry_path != "No file selected" and os.path.exists(entry_path):
+ epub_path = entry_path
+
+ if epub_path:
+ epub_files_to_process = [epub_path]
+
+ if not epub_files_to_process:
+ messagebox.showerror("Error", "No EPUB file(s) selected. Please select EPUB file(s) first.")
+ return
+
+ # Process each EPUB file to find and delete translated_headers.txt
+ files_found = []
+ files_not_found = []
+ files_deleted = []
+ errors = []
+
+ current_dir = os.getcwd()
+ script_dir = os.path.dirname(os.path.abspath(__file__))
+
+ # First pass: scan for files
+ for epub_path in epub_files_to_process:
+ try:
+ epub_base = os.path.splitext(os.path.basename(epub_path))[0]
+ self.append_log(f"🔍 Processing EPUB: {epub_base}")
+
+ # Check the most common locations in order of priority (same as QA scanner)
+ candidates = [
+ os.path.join(current_dir, epub_base), # current working directory
+ os.path.join(script_dir, epub_base), # src directory (where output typically goes)
+ os.path.join(current_dir, 'src', epub_base), # src subdirectory from current dir
+ ]
+
+ output_dir = None
+ for candidate in candidates:
+ if os.path.isdir(candidate):
+ # Verify the folder actually contains HTML/XHTML files
+ try:
+ files = os.listdir(candidate)
+ html_files = [f for f in files if f.lower().endswith(('.html', '.xhtml', '.htm'))]
+ if html_files:
+ output_dir = candidate
+ break
+ except Exception:
+ continue
+
+ if not output_dir:
+ self.append_log(f" ⚠️ No output directory found for {epub_base}")
+ files_not_found.append((epub_base, "No output directory found"))
+ continue
+
+ # Look for translated_headers.txt in the output directory
+ headers_file = os.path.join(output_dir, "translated_headers.txt")
+
+ if os.path.exists(headers_file):
+ files_found.append((epub_base, headers_file))
+ self.append_log(f" ✓ Found translated_headers.txt in {os.path.basename(output_dir)}")
+ else:
+ files_not_found.append((epub_base, "translated_headers.txt not found"))
+ self.append_log(f" ⚠️ No translated_headers.txt in {os.path.basename(output_dir)}")
+
+ except Exception as e:
+ epub_base = os.path.splitext(os.path.basename(epub_path))[0]
+ errors.append((epub_base, str(e)))
+ self.append_log(f" ❌ Error processing {epub_base}: {e}")
+
+ # Show summary and get user confirmation
+ if not files_found and not files_not_found and not errors:
+ messagebox.showinfo("No Files", "No EPUB files were processed.")
+ return
+
+ summary_text = f"Summary for {len(epub_files_to_process)} EPUB file(s):\n\n"
+
+ if files_found:
+ summary_text += f"✅ Files to delete ({len(files_found)}):\n"
+ for epub_base, file_path in files_found:
+ summary_text += f" • {epub_base}\n"
+ summary_text += "\n"
+
+ if files_not_found:
+ summary_text += f"⚠️ Files not found ({len(files_not_found)}):\n"
+ for epub_base, reason in files_not_found:
+ summary_text += f" • {epub_base}: {reason}\n"
+ summary_text += "\n"
+
+ if errors:
+ summary_text += f"❌ Errors ({len(errors)}):\n"
+ for epub_base, error in errors:
+ summary_text += f" • {epub_base}: {error}\n"
+ summary_text += "\n"
+
+ if files_found:
+ summary_text += "This will allow headers to be re-translated on the next run."
+
+ # Confirm deletion
+ result = messagebox.askyesno("Confirm Deletion", summary_text)
+
+ if result:
+ # Delete the files
+ for epub_base, headers_file in files_found:
+ try:
+ os.remove(headers_file)
+ files_deleted.append(epub_base)
+ self.append_log(f"✅ Deleted translated_headers.txt from {epub_base}")
+ except Exception as e:
+ errors.append((epub_base, f"Delete failed: {e}"))
+ self.append_log(f"❌ Failed to delete translated_headers.txt from {epub_base}: {e}")
+
+ # Show final results
+ if files_deleted:
+ success_msg = f"Successfully deleted {len(files_deleted)} file(s):\n"
+ success_msg += "\n".join([f"• {epub_base}" for epub_base in files_deleted])
+ if errors:
+ success_msg += f"\n\nErrors: {len(errors)} file(s) failed to delete."
+ messagebox.showinfo("Success", success_msg)
+ else:
+ messagebox.showerror("Error", "No files were successfully deleted.")
+ else:
+ # No files to delete
+ messagebox.showinfo("No Files to Delete", summary_text)
+
+ except Exception as e:
+ self.append_log(f"❌ Error deleting translated_headers.txt: {e}")
+ messagebox.showerror("Error", f"Failed to delete file: {e}")
+
+ def validate_epub_structure_gui(self):
+ """GUI wrapper for EPUB structure validation"""
+ input_path = self.entry_epub.get()
+ if not input_path:
+ messagebox.showerror("Error", "Please select a file first.")
+ return
+
+ if input_path.lower().endswith('.txt'):
+ messagebox.showinfo("Info", "Structure validation is only available for EPUB files.")
+ return
+
+ epub_base = os.path.splitext(os.path.basename(input_path))[0]
+ output_dir = epub_base
+
+ if not os.path.exists(output_dir):
+ messagebox.showinfo("Info", f"No output directory found: {output_dir}")
+ return
+
+ self.append_log("🔍 Validating EPUB structure...")
+
+ try:
+ from TransateKRtoEN import validate_epub_structure, check_epub_readiness
+
+ structure_ok = validate_epub_structure(output_dir)
+ readiness_ok = check_epub_readiness(output_dir)
+
+ if structure_ok and readiness_ok:
+ self.append_log("✅ EPUB validation PASSED - Ready for compilation!")
+ messagebox.showinfo("Validation Passed",
+ "✅ All EPUB structure files are present!\n\n"
+ "Your translation is ready for EPUB compilation.")
+ elif structure_ok:
+ self.append_log("⚠️ EPUB structure OK, but some issues found")
+ messagebox.showwarning("Validation Warning",
+ "⚠️ EPUB structure is mostly OK, but some issues were found.\n\n"
+ "Check the log for details.")
+ else:
+ self.append_log("❌ EPUB validation FAILED - Missing critical files")
+ messagebox.showerror("Validation Failed",
+ "❌ Missing critical EPUB files!\n\n"
+ "container.xml and/or OPF files are missing.\n"
+ "Try re-running the translation to extract them.")
+
+ except ImportError as e:
+ self.append_log(f"❌ Could not import validation functions: {e}")
+ messagebox.showerror("Error", "Validation functions not available.")
+ except Exception as e:
+ self.append_log(f"❌ Validation error: {e}")
+ messagebox.showerror("Error", f"Validation failed: {e}")
+
+ def on_profile_select(self, event=None):
+ """Load the selected profile's prompt into the text area."""
+ name = self.profile_var.get()
+ prompt = self.prompt_profiles.get(name, "")
+ self.prompt_text.delete("1.0", tk.END)
+ self.prompt_text.insert("1.0", prompt)
+ self.config['active_profile'] = name
+
+ def save_profile(self):
+ """Save current prompt under selected profile and persist."""
+ name = self.profile_var.get().strip()
+ if not name:
+ messagebox.showerror("Error", "Profile cannot be empty.")
+ return
+ content = self.prompt_text.get('1.0', tk.END).strip()
+ self.prompt_profiles[name] = content
+ self.config['prompt_profiles'] = self.prompt_profiles
+ self.config['active_profile'] = name
+ self.profile_menu['values'] = list(self.prompt_profiles.keys())
+ messagebox.showinfo("Saved", f"Profile '{name}' saved.")
+ self.save_profiles()
+
+ def delete_profile(self):
+ """Delete the selected profile."""
+ name = self.profile_var.get()
+ if name not in self.prompt_profiles:
+ messagebox.showerror("Error", f"Profile '{name}' not found.")
+ return
+ if messagebox.askyesno("Delete", f"Are you sure you want to delete language '{name}'?"):
+ del self.prompt_profiles[name]
+ self.config['prompt_profiles'] = self.prompt_profiles
+ if self.prompt_profiles:
+ new = next(iter(self.prompt_profiles))
+ self.profile_var.set(new)
+ self.on_profile_select()
+ else:
+ self.profile_var.set("")
+ self.prompt_text.delete('1.0', tk.END)
+ self.profile_menu['values'] = list(self.prompt_profiles.keys())
+ self.save_profiles()
+
+ def save_profiles(self):
+ """Persist only the prompt profiles and active profile."""
+ try:
+ data = {}
+ if os.path.exists(CONFIG_FILE):
+ with open(CONFIG_FILE, 'r', encoding='utf-8') as f:
+ data = json.load(f)
+ data['prompt_profiles'] = self.prompt_profiles
+ data['active_profile'] = self.profile_var.get()
+ with open(CONFIG_FILE, 'w', encoding='utf-8') as f:
+ json.dump(data, f, ensure_ascii=False, indent=2)
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to save profiles: {e}")
+
+ def import_profiles(self):
+ """Import profiles from a JSON file, merging into existing ones."""
+ path = filedialog.askopenfilename(title="Import Profiles", filetypes=[("JSON files","*.json")])
+ if not path:
+ return
+ try:
+ with open(path, 'r', encoding='utf-8') as f:
+ data = json.load(f)
+ self.prompt_profiles.update(data)
+ self.config['prompt_profiles'] = self.prompt_profiles
+ self.profile_menu['values'] = list(self.prompt_profiles.keys())
+ messagebox.showinfo("Imported", f"Imported {len(data)} profiles.")
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to import profiles: {e}")
+
+ def export_profiles(self):
+ """Export all profiles to a JSON file."""
+ path = filedialog.asksaveasfilename(title="Export Profiles", defaultextension=".json",
+ filetypes=[("JSON files","*.json")])
+ if not path:
+ return
+ try:
+ with open(path, 'w', encoding='utf-8') as f:
+ json.dump(self.prompt_profiles, f, ensure_ascii=False, indent=2)
+ messagebox.showinfo("Exported", f"Profiles exported to {path}.")
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to export profiles: {e}")
+
+ def __setattr__(self, name, value):
+ """Debug method to track when manual_glossary_path gets cleared"""
+ if name == 'manual_glossary_path':
+ import traceback
+ if value is None and hasattr(self, 'manual_glossary_path') and self.manual_glossary_path is not None:
+ if hasattr(self, 'append_log'):
+ self.append_log(f"[DEBUG] CLEARING manual_glossary_path from {self.manual_glossary_path} to None")
+ self.append_log(f"[DEBUG] Stack trace: {''.join(traceback.format_stack()[-3:-1])}")
+ else:
+ print(f"[DEBUG] CLEARING manual_glossary_path from {getattr(self, 'manual_glossary_path', 'unknown')} to None")
+ print(f"[DEBUG] Stack trace: {''.join(traceback.format_stack()[-3:-1])}")
+ super().__setattr__(name, value)
+
+ def load_glossary(self):
+ """Let the user pick a glossary file (JSON or CSV) and remember its path."""
+ import json
+ import shutil
+ from tkinter import filedialog, messagebox
+
+ path = filedialog.askopenfilename(
+ title="Select glossary file",
+ filetypes=[
+ ("Supported files", "*.json;*.csv;*.txt"),
+ ("JSON files", "*.json"),
+ ("CSV files", "*.csv"),
+ ("Text files", "*.txt"),
+ ("All files", "*.*")
+ ]
+ )
+ if not path:
+ return
+
+ # Determine file type
+ file_extension = os.path.splitext(path)[1].lower()
+
+ if file_extension == '.csv':
+ # Handle CSV file - just pass it through as-is
+ # The translation system will handle the CSV file format
+ pass
+
+ elif file_extension == '.txt':
+ # Handle TXT file - just pass it through as-is
+ # The translation system will handle the text file format
+ pass
+
+ elif file_extension == '.json':
+ # Original JSON handling code
+ try:
+ with open(path, 'r', encoding='utf-8') as f:
+ content = f.read()
+
+ # Store original content for comparison
+ original_content = content
+
+ # Try normal JSON load first
+ try:
+ json.loads(content)
+ except json.JSONDecodeError as e:
+ self.append_log(f"⚠️ JSON error detected: {str(e)}")
+ self.append_log("🔧 Attempting comprehensive auto-fix...")
+
+ # Apply comprehensive auto-fixes
+ fixed_content = self._comprehensive_json_fix(content)
+
+ # Try to parse the fixed content
+ try:
+ json.loads(fixed_content)
+
+ # If successful, ask user if they want to save the fixed version
+ response = messagebox.askyesno(
+ "JSON Auto-Fix Successful",
+ f"The JSON file had errors that were automatically fixed.\n\n"
+ f"Original error: {str(e)}\n\n"
+ f"Do you want to save the fixed version?\n"
+ f"(A backup of the original will be created)"
+ )
+
+ if response:
+ # Save the fixed version
+ backup_path = path.replace('.json', '_backup.json')
+ shutil.copy2(path, backup_path)
+
+ with open(path, 'w', encoding='utf-8') as f:
+ f.write(fixed_content)
+
+ self.append_log(f"✅ Auto-fixed JSON and saved. Backup created: {os.path.basename(backup_path)}")
+ content = fixed_content
+ else:
+ self.append_log("⚠️ Using original JSON with errors (may cause issues)")
+
+ except json.JSONDecodeError as e2:
+ # Auto-fix failed, show error and options
+ self.append_log(f"❌ Auto-fix failed: {str(e2)}")
+
+ # Build detailed error message
+ error_details = self._analyze_json_errors(content, fixed_content, e, e2)
+
+ response = messagebox.askyesnocancel(
+ "JSON Fix Failed",
+ f"The JSON file has errors that couldn't be automatically fixed.\n\n"
+ f"Original error: {str(e)}\n"
+ f"After auto-fix attempt: {str(e2)}\n\n"
+ f"{error_details}\n\n"
+ f"Options:\n"
+ f"• YES: Open the file in your default editor to fix manually\n"
+ f"• NO: Try to use the file anyway (may fail)\n"
+ f"• CANCEL: Cancel loading this glossary"
+ )
+
+ if response is True: # YES - open in editor
+ try:
+ # Open file in default editor
+ import subprocess
+ import sys
+
+ if sys.platform.startswith('win'):
+ os.startfile(path)
+ elif sys.platform.startswith('darwin'):
+ subprocess.run(['open', path])
+ else: # linux
+ subprocess.run(['xdg-open', path])
+
+ messagebox.showinfo(
+ "Manual Edit",
+ "Please fix the JSON errors in your editor and save the file.\n"
+ "Then click OK to retry loading the glossary."
+ )
+
+ # Recursively call load_glossary to retry
+ self.load_glossary()
+ return
+
+ except Exception as editor_error:
+ messagebox.showerror(
+ "Error",
+ f"Failed to open file in editor: {str(editor_error)}\n\n"
+ f"Please manually edit the file:\n{path}"
+ )
+ return
+
+ elif response is False: # NO - try to use anyway
+ self.append_log("⚠️ Attempting to use JSON with errors (may cause issues)")
+ # Continue with the original content
+
+ else: # CANCEL
+ self.append_log("❌ Glossary loading cancelled")
+ return
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to read glossary file: {str(e)}")
+ return
+
+ else:
+ messagebox.showerror(
+ "Error",
+ f"Unsupported file type: {file_extension}\n"
+ "Please select a JSON, CSV, or TXT file."
+ )
+ return
+
+ # Clear auto-loaded tracking when manually loading
+ self.auto_loaded_glossary_path = None
+ self.auto_loaded_glossary_for_file = None
+
+ self.manual_glossary_path = path
+ self.manual_glossary_manually_loaded = True
+ self.append_log(f"📑 Loaded manual glossary: {path}")
+
+ # Save the file extension for later reference
+ self.manual_glossary_file_extension = file_extension
+
+ self.append_glossary_var.set(True)
+ self.append_log("✅ Automatically enabled 'Append Glossary to System Prompt'")
+
+ def _comprehensive_json_fix(self, content):
+ """Apply comprehensive JSON fixes."""
+ import re
+
+ # Store original for comparison
+ fixed = content
+
+ # 1. Remove BOM if present
+ if fixed.startswith('\ufeff'):
+ fixed = fixed[1:]
+
+ # 2. Fix common Unicode issues first
+ replacements = {
+ '"': '"', # Left smart quote
+ '"': '"', # Right smart quote
+ ''': "'", # Left smart apostrophe
+ ''': "'", # Right smart apostrophe
+ '–': '-', # En dash
+ '—': '-', # Em dash
+ '…': '...', # Ellipsis
+ '\u200b': '', # Zero-width space
+ '\u00a0': ' ', # Non-breaking space
+ }
+ for old, new in replacements.items():
+ fixed = fixed.replace(old, new)
+
+ # 3. Fix trailing commas in objects and arrays
+ fixed = re.sub(r',\s*}', '}', fixed)
+ fixed = re.sub(r',\s*]', ']', fixed)
+
+ # 4. Fix multiple commas
+ fixed = re.sub(r',\s*,+', ',', fixed)
+
+ # 5. Fix missing commas between array/object elements
+ # Between closing and opening braces/brackets
+ fixed = re.sub(r'}\s*{', '},{', fixed)
+ fixed = re.sub(r']\s*\[', '],[', fixed)
+ fixed = re.sub(r'}\s*\[', '},[', fixed)
+ fixed = re.sub(r']\s*{', '],{', fixed)
+
+ # Between string values (but not inside strings)
+ # This is tricky, so we'll be conservative
+ fixed = re.sub(r'"\s+"(?=[^:]*":)', '","', fixed)
+
+ # 6. Fix unquoted keys (simple cases)
+ # Match unquoted keys that are followed by a colon
+ fixed = re.sub(r'([{,]\s*)([a-zA-Z_][a-zA-Z0-9_]*)\s*:', r'\1"\2":', fixed)
+
+ # 7. Fix single quotes to double quotes for keys and simple string values
+ # Keys
+ fixed = re.sub(r"([{,]\s*)'([^']+)'(\s*:)", r'\1"\2"\3', fixed)
+ # Simple string values (be conservative)
+ fixed = re.sub(r"(:\s*)'([^'\"]*)'(\s*[,}])", r'\1"\2"\3', fixed)
+
+ # 8. Fix common escape issues
+ # Replace single backslashes with double backslashes (except for valid escapes)
+ # This is complex, so we'll only fix obvious cases
+ fixed = re.sub(r'\\(?!["\\/bfnrtu])', r'\\\\', fixed)
+
+ # 9. Ensure proper brackets/braces balance
+ # Count opening and closing brackets
+ open_braces = fixed.count('{')
+ close_braces = fixed.count('}')
+ open_brackets = fixed.count('[')
+ close_brackets = fixed.count(']')
+
+ # Add missing closing braces/brackets at the end
+ if open_braces > close_braces:
+ fixed += '}' * (open_braces - close_braces)
+ if open_brackets > close_brackets:
+ fixed += ']' * (open_brackets - close_brackets)
+
+ # 10. Remove trailing comma before EOF
+ fixed = re.sub(r',\s*$', '', fixed.strip())
+
+ # 11. Fix unescaped newlines in strings (conservative approach)
+ # This is very tricky to do with regex without a proper parser
+ # We'll skip this for safety
+
+ # 12. Remove comments (JSON doesn't support comments)
+ # Remove // style comments
+ fixed = re.sub(r'//.*$', '', fixed, flags=re.MULTILINE)
+ # Remove /* */ style comments
+ fixed = re.sub(r'/\*.*?\*/', '', fixed, flags=re.DOTALL)
+
+ return fixed
+
+ def _analyze_json_errors(self, original, fixed, original_error, fixed_error):
+ """Analyze JSON errors and provide helpful information."""
+ analysis = []
+
+ # Check for common issues
+ if '{' in original and original.count('{') != original.count('}'):
+ analysis.append(f"• Mismatched braces: {original.count('{')} opening, {original.count('}')} closing")
+
+ if '[' in original and original.count('[') != original.count(']'):
+ analysis.append(f"• Mismatched brackets: {original.count('[')} opening, {original.count(']')} closing")
+
+ if original.count('"') % 2 != 0:
+ analysis.append("• Odd number of quotes (possible unclosed string)")
+
+ # Check for BOM
+ if original.startswith('\ufeff'):
+ analysis.append("• File starts with BOM (Byte Order Mark)")
+
+ # Check for common problematic patterns
+ if re.search(r'[''""…]', original):
+ analysis.append("• Contains smart quotes or special Unicode characters")
+
+ if re.search(r':\s*[a-zA-Z_][a-zA-Z0-9_]*\s*[,}]', original):
+ analysis.append("• Possible unquoted string values")
+
+ if re.search(r'[{,]\s*[a-zA-Z_][a-zA-Z0-9_]*\s*:', original):
+ analysis.append("• Possible unquoted keys")
+
+ if '//' in original or '/*' in original:
+ analysis.append("• Contains comments (not valid in JSON)")
+
+ # Try to find the approximate error location
+ if hasattr(original_error, 'lineno'):
+ lines = original.split('\n')
+ if 0 < original_error.lineno <= len(lines):
+ error_line = lines[original_error.lineno - 1]
+ analysis.append(f"\nError near line {original_error.lineno}:")
+ analysis.append(f" {error_line.strip()}")
+
+ return "\n".join(analysis) if analysis else "Unable to determine specific issues."
+
+ def save_config(self, show_message=True):
+ """Persist all settings to config.json."""
+ try:
+ # Create backup of existing config before saving
+ self._backup_config_file()
+ def safe_int(value, default):
+ try: return int(value)
+ except (ValueError, TypeError): return default
+
+ def safe_float(value, default):
+ try: return float(value)
+ except (ValueError, TypeError): return default
+
+ # Basic settings
+ self.config['model'] = self.model_var.get()
+ self.config['active_profile'] = self.profile_var.get()
+ self.config['prompt_profiles'] = self.prompt_profiles
+ self.config['contextual'] = self.contextual_var.get()
+
+ # Validate numeric fields
+ delay_val = self.delay_entry.get().strip()
+ if delay_val and not delay_val.replace('.', '', 1).isdigit():
+ messagebox.showerror("Invalid Input", "Please enter a valid number for API call delay")
+ return
+ self.config['delay'] = safe_float(delay_val, 2)
+
+ thread_delay_val = self.thread_delay_var.get().strip()
+ if not thread_delay_val.replace('.', '', 1).isdigit():
+ messagebox.showerror("Invalid Input", "Please enter a valid number for Threading Delay")
+ return
+ self.config['thread_submission_delay'] = safe_float(thread_delay_val, 0.5)
+
+ trans_temp_val = self.trans_temp.get().strip()
+ if trans_temp_val:
+ try: float(trans_temp_val)
+ except ValueError:
+ messagebox.showerror("Invalid Input", "Please enter a valid number for Temperature")
+ return
+ self.config['translation_temperature'] = safe_float(trans_temp_val, 0.3)
+
+ trans_history_val = self.trans_history.get().strip()
+ if trans_history_val and not trans_history_val.isdigit():
+ messagebox.showerror("Invalid Input", "Please enter a valid number for Translation History Limit")
+ return
+ self.config['translation_history_limit'] = safe_int(trans_history_val, 2)
+
+ # Add fuzzy matching threshold
+ if hasattr(self, 'fuzzy_threshold_var'):
+ fuzzy_val = self.fuzzy_threshold_var.get()
+ if 0.5 <= fuzzy_val <= 1.0:
+ self.config['glossary_fuzzy_threshold'] = fuzzy_val
+ else:
+ self.config['glossary_fuzzy_threshold'] = 0.90 # default
+
+ # Add glossary format preference
+ if hasattr(self, 'use_legacy_csv_var'):
+ self.config['glossary_use_legacy_csv'] = self.use_legacy_csv_var.get()
+
+ # Add after saving translation_prompt_text:
+ if hasattr(self, 'format_instructions_text'):
+ try:
+ self.config['glossary_format_instructions'] = self.format_instructions_text.get('1.0', tk.END).strip()
+ except:
+ pass
+
+ if hasattr(self, 'azure_api_version_var'):
+ self.config['azure_api_version'] = self.azure_api_version_var.get()
+
+ # Save all other settings
+ self.config['api_key'] = self.api_key_entry.get()
+ self.config['REMOVE_AI_ARTIFACTS'] = self.REMOVE_AI_ARTIFACTS_var.get()
+ self.config['attach_css_to_chapters'] = self.attach_css_to_chapters_var.get()
+ self.config['chapter_range'] = self.chapter_range_entry.get().strip()
+ self.config['use_rolling_summary'] = self.rolling_summary_var.get()
+ self.config['summary_role'] = self.summary_role_var.get()
+ self.config['max_output_tokens'] = self.max_output_tokens
+ self.config['translate_book_title'] = self.translate_book_title_var.get()
+ self.config['book_title_prompt'] = self.book_title_prompt
+ self.config['append_glossary'] = self.append_glossary_var.get()
+ self.config['emergency_paragraph_restore'] = self.emergency_restore_var.get()
+ self.config['reinforcement_frequency'] = safe_int(self.reinforcement_freq_var.get(), 10)
+ self.config['retry_duplicate_bodies'] = self.retry_duplicate_var.get()
+ self.config['duplicate_lookback_chapters'] = safe_int(self.duplicate_lookback_var.get(), 5)
+ self.config['token_limit_disabled'] = self.token_limit_disabled
+ self.config['glossary_min_frequency'] = safe_int(self.glossary_min_frequency_var.get(), 2)
+ self.config['glossary_max_names'] = safe_int(self.glossary_max_names_var.get(), 50)
+ self.config['glossary_max_titles'] = safe_int(self.glossary_max_titles_var.get(), 30)
+ self.config['glossary_batch_size'] = safe_int(self.glossary_batch_size_var.get(), 50)
+ self.config['enable_image_translation'] = self.enable_image_translation_var.get()
+ self.config['process_webnovel_images'] = self.process_webnovel_images_var.get()
+ self.config['webnovel_min_height'] = safe_int(self.webnovel_min_height_var.get(), 1000)
+ self.config['max_images_per_chapter'] = safe_int(self.max_images_per_chapter_var.get(), 1)
+ self.config['batch_translation'] = self.batch_translation_var.get()
+ self.config['batch_size'] = safe_int(self.batch_size_var.get(), 3)
+ self.config['conservative_batching'] = self.conservative_batching_var.get()
+ self.config['translation_history_rolling'] = self.translation_history_rolling_var.get()
+
+ # OpenRouter transport/compression toggles (ensure persisted even when dialog not open)
+ if hasattr(self, 'openrouter_http_only_var'):
+ self.config['openrouter_use_http_only'] = bool(self.openrouter_http_only_var.get())
+ os.environ['OPENROUTER_USE_HTTP_ONLY'] = '1' if self.openrouter_http_only_var.get() else '0'
+ if hasattr(self, 'openrouter_accept_identity_var'):
+ self.config['openrouter_accept_identity'] = bool(self.openrouter_accept_identity_var.get())
+ os.environ['OPENROUTER_ACCEPT_IDENTITY'] = '1' if self.openrouter_accept_identity_var.get() else '0'
+ self.config['glossary_history_rolling'] = self.glossary_history_rolling_var.get()
+ self.config['disable_epub_gallery'] = self.disable_epub_gallery_var.get()
+ self.config['disable_automatic_cover_creation'] = self.disable_automatic_cover_creation_var.get()
+ self.config['translate_cover_html'] = self.translate_cover_html_var.get()
+ self.config['enable_auto_glossary'] = self.enable_auto_glossary_var.get()
+ self.config['duplicate_detection_mode'] = self.duplicate_detection_mode_var.get()
+ self.config['chapter_number_offset'] = safe_int(self.chapter_number_offset_var.get(), 0)
+ self.config['use_header_as_output'] = self.use_header_as_output_var.get()
+ self.config['enable_decimal_chapters'] = self.enable_decimal_chapters_var.get()
+ self.config['enable_watermark_removal'] = self.enable_watermark_removal_var.get()
+ self.config['save_cleaned_images'] = self.save_cleaned_images_var.get()
+ self.config['advanced_watermark_removal'] = self.advanced_watermark_removal_var.get()
+ self.config['compression_factor'] = self.compression_factor_var.get()
+ self.config['translation_chunk_prompt'] = self.translation_chunk_prompt
+ self.config['image_chunk_prompt'] = self.image_chunk_prompt
+ self.config['force_ncx_only'] = self.force_ncx_only_var.get()
+ self.config['vertex_ai_location'] = self.vertex_location_var.get()
+ self.config['batch_translate_headers'] = self.batch_translate_headers_var.get()
+ self.config['headers_per_batch'] = self.headers_per_batch_var.get()
+ self.config['update_html_headers'] = self.update_html_headers_var.get()
+ self.config['save_header_translations'] = self.save_header_translations_var.get()
+ self.config['single_api_image_chunks'] = self.single_api_image_chunks_var.get()
+ self.config['enable_gemini_thinking'] = self.enable_gemini_thinking_var.get()
+ self.config['thinking_budget'] = int(self.thinking_budget_var.get()) if self.thinking_budget_var.get().lstrip('-').isdigit() else 0
+ self.config['enable_gpt_thinking'] = self.enable_gpt_thinking_var.get()
+ self.config['gpt_reasoning_tokens'] = int(self.gpt_reasoning_tokens_var.get()) if self.gpt_reasoning_tokens_var.get().lstrip('-').isdigit() else 0
+ self.config['gpt_effort'] = self.gpt_effort_var.get()
+ self.config['openai_base_url'] = self.openai_base_url_var.get()
+ self.config['groq_base_url'] = self.groq_base_url_var.get() # This was missing!
+ self.config['fireworks_base_url'] = self.fireworks_base_url_var.get()
+ self.config['use_custom_openai_endpoint'] = self.use_custom_openai_endpoint_var.get()
+
+ # Save additional important missing settings
+ if hasattr(self, 'retain_source_extension_var'):
+ self.config['retain_source_extension'] = self.retain_source_extension_var.get()
+ # Update environment variable
+ os.environ['RETAIN_SOURCE_EXTENSION'] = '1' if self.retain_source_extension_var.get() else '0'
+
+ if hasattr(self, 'use_fallback_keys_var'):
+ self.config['use_fallback_keys'] = self.use_fallback_keys_var.get()
+
+ if hasattr(self, 'auto_update_check_var'):
+ self.config['auto_update_check'] = self.auto_update_check_var.get()
+
+ # Preserve last update check time if it exists
+ if hasattr(self, 'update_manager') and self.update_manager:
+ self.config['last_update_check_time'] = self.update_manager._last_check_time
+
+ # Save window manager safe ratios setting
+ if hasattr(self, 'wm') and hasattr(self.wm, '_force_safe_ratios'):
+ self.config['force_safe_ratios'] = self.wm._force_safe_ratios
+
+ # Save metadata-related ignore settings
+ if hasattr(self, 'ignore_header_var'):
+ self.config['ignore_header'] = self.ignore_header_var.get()
+
+ if hasattr(self, 'ignore_title_var'):
+ self.config['ignore_title'] = self.ignore_title_var.get()
+ self.config['disable_chapter_merging'] = self.disable_chapter_merging_var.get()
+ self.config['use_gemini_openai_endpoint'] = self.use_gemini_openai_endpoint_var.get()
+ self.config['gemini_openai_endpoint'] = self.gemini_openai_endpoint_var.get()
+ # Save extraction worker settings
+ self.config['enable_parallel_extraction'] = self.enable_parallel_extraction_var.get()
+ self.config['extraction_workers'] = self.extraction_workers_var.get()
+ self.config['glossary_max_text_size'] = self.glossary_max_text_size_var.get()
+ self.config['glossary_chapter_split_threshold'] = self.glossary_chapter_split_threshold_var.get()
+ self.config['glossary_filter_mode'] = self.glossary_filter_mode_var.get()
+ self.config['image_chunk_overlap'] = safe_float(self.image_chunk_overlap_var.get(), 1.0)
+
+
+
+ # NEW: Save strip honorifics setting
+ self.config['strip_honorifics'] = self.strip_honorifics_var.get()
+
+ # Save glossary backup settings
+ if hasattr(self, 'config') and 'glossary_auto_backup' in self.config:
+ # These might be set from the glossary backup dialog
+ pass # Already in config, don't overwrite
+ else:
+ # Set defaults if not already set
+ self.config.setdefault('glossary_auto_backup', True)
+ self.config.setdefault('glossary_max_backups', 50)
+
+ # Save QA Scanner settings if they exist
+ if hasattr(self, 'config') and 'qa_scanner_settings' in self.config:
+ # QA scanner settings already exist in config, keep them
+ pass
+ else:
+ # Initialize default QA scanner settings if not present
+ default_qa_settings = {
+ 'foreign_char_threshold': 10,
+ 'excluded_characters': '',
+ 'check_encoding_issues': False,
+ 'check_repetition': True,
+'check_translation_artifacts': False,
+ 'check_glossary_leakage': True,
+ 'min_file_length': 0,
+ 'report_format': 'detailed',
+ 'auto_save_report': True,
+ 'check_word_count_ratio': False,
+ 'check_multiple_headers': True,
+ 'warn_name_mismatch': False,
+ 'check_missing_html_tag': True,
+ 'check_paragraph_structure': True,
+ 'check_invalid_nesting': False,
+ 'paragraph_threshold': 0.3,
+ 'cache_enabled': True,
+ 'cache_auto_size': False,
+ 'cache_show_stats': False
+ }
+ self.config.setdefault('qa_scanner_settings', default_qa_settings)
+
+ # Save AI Hunter config settings if they exist
+ if 'ai_hunter_config' not in self.config:
+ self.config['ai_hunter_config'] = {}
+ # Ensure ai_hunter_max_workers has a default value
+ self.config['ai_hunter_config'].setdefault('ai_hunter_max_workers', 1)
+
+ # NEW: Save prompts from text widgets if they exist
+ if hasattr(self, 'auto_prompt_text'):
+ try:
+ self.config['auto_glossary_prompt'] = self.auto_prompt_text.get('1.0', tk.END).strip()
+ except:
+ pass
+
+ if hasattr(self, 'append_prompt_text'):
+ try:
+ self.config['append_glossary_prompt'] = self.append_prompt_text.get('1.0', tk.END).strip()
+ except:
+ pass
+
+ if hasattr(self, 'translation_prompt_text'):
+ try:
+ self.config['glossary_translation_prompt'] = self.translation_prompt_text.get('1.0', tk.END).strip()
+ except:
+ pass
+
+ # Update environment variable when saving
+ if self.enable_parallel_extraction_var.get():
+ os.environ["EXTRACTION_WORKERS"] = str(self.extraction_workers_var.get())
+ else:
+ os.environ["EXTRACTION_WORKERS"] = "1"
+
+ # Chapter Extraction Settings - Save all extraction-related settings
+ # These are the critical settings shown in the screenshot
+
+ # Save Text Extraction Method (Standard/Enhanced)
+ if hasattr(self, 'text_extraction_method_var'):
+ self.config['text_extraction_method'] = self.text_extraction_method_var.get()
+
+ # Save File Filtering Level (Smart/Comprehensive/Full)
+ if hasattr(self, 'file_filtering_level_var'):
+ self.config['file_filtering_level'] = self.file_filtering_level_var.get()
+
+ # Save Preserve Markdown Structure setting
+ if hasattr(self, 'enhanced_preserve_structure_var'):
+ self.config['enhanced_preserve_structure'] = self.enhanced_preserve_structure_var.get()
+
+ # Save Enhanced Filtering setting (for backwards compatibility)
+ if hasattr(self, 'enhanced_filtering_var'):
+ self.config['enhanced_filtering'] = self.enhanced_filtering_var.get()
+
+ # Save force BeautifulSoup for traditional APIs
+ if hasattr(self, 'force_bs_for_traditional_var'):
+ self.config['force_bs_for_traditional'] = self.force_bs_for_traditional_var.get()
+
+ # Update extraction_mode for backwards compatibility with older versions
+ if hasattr(self, 'text_extraction_method_var') and hasattr(self, 'file_filtering_level_var'):
+ if self.text_extraction_method_var.get() == 'enhanced':
+ self.config['extraction_mode'] = 'enhanced'
+ # When enhanced mode is selected, the filtering level applies to enhanced mode
+ self.config['enhanced_filtering'] = self.file_filtering_level_var.get()
+ else:
+ # When standard mode is selected, use the filtering level directly
+ self.config['extraction_mode'] = self.file_filtering_level_var.get()
+ elif hasattr(self, 'extraction_mode_var'):
+ # Fallback for older UI
+ self.config['extraction_mode'] = self.extraction_mode_var.get()
+
+ # Save image compression settings if they exist
+ # These are saved from the compression dialog, but we ensure defaults here
+ if 'enable_image_compression' not in self.config:
+ self.config['enable_image_compression'] = False
+ if 'auto_compress_enabled' not in self.config:
+ self.config['auto_compress_enabled'] = True
+ if 'target_image_tokens' not in self.config:
+ self.config['target_image_tokens'] = 1000
+ if 'image_compression_format' not in self.config:
+ self.config['image_compression_format'] = 'auto'
+ if 'webp_quality' not in self.config:
+ self.config['webp_quality'] = 85
+ if 'jpeg_quality' not in self.config:
+ self.config['jpeg_quality'] = 85
+ if 'png_compression' not in self.config:
+ self.config['png_compression'] = 6
+ if 'max_image_dimension' not in self.config:
+ self.config['max_image_dimension'] = 2048
+ if 'max_image_size_mb' not in self.config:
+ self.config['max_image_size_mb'] = 10
+ if 'preserve_transparency' not in self.config:
+ self.config['preserve_transparency'] = False
+ if 'preserve_original_format' not in self.config:
+ self.config['preserve_original_format'] = False
+ if 'optimize_for_ocr' not in self.config:
+ self.config['optimize_for_ocr'] = True
+ if 'progressive_encoding' not in self.config:
+ self.config['progressive_encoding'] = True
+ if 'save_compressed_images' not in self.config:
+ self.config['save_compressed_images'] = False
+
+
+ # Add anti-duplicate parameters
+ if hasattr(self, 'enable_anti_duplicate_var'):
+ self.config['enable_anti_duplicate'] = self.enable_anti_duplicate_var.get()
+ self.config['top_p'] = self.top_p_var.get()
+ self.config['top_k'] = self.top_k_var.get()
+ self.config['frequency_penalty'] = self.frequency_penalty_var.get()
+ self.config['presence_penalty'] = self.presence_penalty_var.get()
+ self.config['repetition_penalty'] = self.repetition_penalty_var.get()
+ self.config['candidate_count'] = self.candidate_count_var.get()
+ self.config['custom_stop_sequences'] = self.custom_stop_sequences_var.get()
+ self.config['logit_bias_enabled'] = self.logit_bias_enabled_var.get()
+ self.config['logit_bias_strength'] = self.logit_bias_strength_var.get()
+ self.config['bias_common_words'] = self.bias_common_words_var.get()
+ self.config['bias_repetitive_phrases'] = self.bias_repetitive_phrases_var.get()
+
+ # Save scanning phase settings
+ if hasattr(self, 'scan_phase_enabled_var'):
+ self.config['scan_phase_enabled'] = self.scan_phase_enabled_var.get()
+ if hasattr(self, 'scan_phase_mode_var'):
+ self.config['scan_phase_mode'] = self.scan_phase_mode_var.get()
+
+ _tl = self.token_limit_entry.get().strip()
+ if _tl.isdigit():
+ self.config['token_limit'] = int(_tl)
+ else:
+ self.config['token_limit'] = None
+
+ # Store Google Cloud credentials path BEFORE encryption
+ # This should NOT be encrypted since it's just a file path
+ google_creds_path = self.config.get('google_cloud_credentials')
+
+ # Encrypt the config
+ encrypted_config = encrypt_config(self.config)
+
+ # Re-add the Google Cloud credentials path after encryption
+ # This ensures the path is stored unencrypted for easy access
+ if google_creds_path:
+ encrypted_config['google_cloud_credentials'] = google_creds_path
+
+ # Validate config can be serialized to JSON before writing
+ try:
+ json_test = json.dumps(encrypted_config, ensure_ascii=False, indent=2)
+ except Exception as e:
+ raise Exception(f"Config validation failed - invalid JSON: {e}")
+
+ # Write to file
+ with open(CONFIG_FILE, 'w', encoding='utf-8') as f:
+ json.dump(encrypted_config, f, ensure_ascii=False, indent=2)
+
+ # Only show message if requested
+ if show_message:
+ messagebox.showinfo("Saved", "Configuration saved.")
+
+ except Exception as e:
+ # Always show error messages regardless of show_message
+ messagebox.showerror("Error", f"Failed to save configuration: {e}")
+ # Try to restore from backup if save failed
+ self._restore_config_from_backup()
+
+ def _backup_config_file(self):
+ """Create backup of the existing config file before saving."""
+ try:
+ # Skip if config file doesn't exist yet
+ if not os.path.exists(CONFIG_FILE):
+ return
+
+ # Get base directory that works in both development and frozen environments
+ base_dir = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
+
+ # Resolve config file path for backup directory
+ if os.path.isabs(CONFIG_FILE):
+ config_dir = os.path.dirname(CONFIG_FILE)
+ else:
+ config_dir = os.path.dirname(os.path.abspath(CONFIG_FILE))
+
+ # Create backup directory
+ backup_dir = os.path.join(config_dir, "config_backups")
+ os.makedirs(backup_dir, exist_ok=True)
+
+ # Create timestamped backup name
+ backup_name = f"config_{time.strftime('%Y%m%d_%H%M%S')}.json.bak"
+ backup_path = os.path.join(backup_dir, backup_name)
+
+ # Copy the file
+ shutil.copy2(CONFIG_FILE, backup_path)
+
+ # Maintain only the last 10 backups
+ backups = [os.path.join(backup_dir, f) for f in os.listdir(backup_dir)
+ if f.startswith("config_") and f.endswith(".json.bak")]
+ backups.sort(key=lambda x: os.path.getmtime(x), reverse=True)
+
+ # Remove oldest backups if more than 10
+ for old_backup in backups[10:]:
+ try:
+ os.remove(old_backup)
+ except Exception:
+ pass # Ignore errors when cleaning old backups
+
+ except Exception as e:
+ # Silent exception - don't interrupt normal operation if backup fails
+ print(f"Warning: Could not create config backup: {e}")
+
+ def _restore_config_from_backup(self):
+ """Attempt to restore config from the most recent backup."""
+ try:
+ # Locate backups directory
+ if os.path.isabs(CONFIG_FILE):
+ config_dir = os.path.dirname(CONFIG_FILE)
+ else:
+ config_dir = os.path.dirname(os.path.abspath(CONFIG_FILE))
+ backup_dir = os.path.join(config_dir, "config_backups")
+
+ if not os.path.exists(backup_dir):
+ return
+
+ # Find most recent backup
+ backups = [os.path.join(backup_dir, f) for f in os.listdir(backup_dir)
+ if f.startswith("config_") and f.endswith(".json.bak")]
+
+ if not backups:
+ return
+
+ backups.sort(key=lambda x: os.path.getmtime(x), reverse=True)
+ latest_backup = backups[0]
+
+ # Copy backup to config file
+ shutil.copy2(latest_backup, CONFIG_FILE)
+ messagebox.showinfo("Config Restored",
+ f"Configuration was restored from backup: {os.path.basename(latest_backup)}")
+
+ # Reload config
+ try:
+ with open(CONFIG_FILE, 'r', encoding='utf-8') as f:
+ self.config = json.load(f)
+ self.config = decrypt_config(self.config)
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to reload configuration: {e}")
+
+ except Exception as e:
+ messagebox.showerror("Restore Failed", f"Could not restore config from backup: {e}")
+
+ def _create_manual_config_backup(self):
+ """Create a manual config backup."""
+ try:
+ # Force create backup even if config file doesn't exist
+ self._backup_config_file()
+ messagebox.showinfo("Backup Created", "Configuration backup created successfully!")
+ except Exception as e:
+ messagebox.showerror("Backup Failed", f"Failed to create backup: {e}")
+
+ def _open_backup_folder(self):
+ """Open the config backups folder in file explorer."""
+ try:
+ if os.path.isabs(CONFIG_FILE):
+ config_dir = os.path.dirname(CONFIG_FILE)
+ else:
+ config_dir = os.path.dirname(os.path.abspath(CONFIG_FILE))
+ backup_dir = os.path.join(config_dir, "config_backups")
+
+ if not os.path.exists(backup_dir):
+ os.makedirs(backup_dir, exist_ok=True)
+ messagebox.showinfo("Backup Folder", f"Created backup folder: {backup_dir}")
+
+ # Open folder in explorer (cross-platform)
+ import subprocess
+ import platform
+
+ if platform.system() == "Windows":
+ os.startfile(backup_dir)
+ elif platform.system() == "Darwin": # macOS
+ subprocess.run(["open", backup_dir])
+ else: # Linux
+ subprocess.run(["xdg-open", backup_dir])
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Could not open backup folder: {e}")
+
+ def _manual_restore_config(self):
+ """Show dialog to manually select and restore a config backup."""
+ try:
+ if os.path.isabs(CONFIG_FILE):
+ config_dir = os.path.dirname(CONFIG_FILE)
+ else:
+ config_dir = os.path.dirname(os.path.abspath(CONFIG_FILE))
+ backup_dir = os.path.join(config_dir, "config_backups")
+
+ if not os.path.exists(backup_dir):
+ messagebox.showinfo("No Backups", "No backup folder found. No backups have been created yet.")
+ return
+
+ # Get list of available backups
+ backups = [f for f in os.listdir(backup_dir)
+ if f.startswith("config_") and f.endswith(".json.bak")]
+
+ if not backups:
+ messagebox.showinfo("No Backups", "No config backups found.")
+ return
+
+ # Sort by creation time (newest first)
+ backups.sort(key=lambda x: os.path.getmtime(os.path.join(backup_dir, x)), reverse=True)
+
+ # Use WindowManager to create scrollable dialog
+ dialog, scrollable_frame, canvas = self.wm.setup_scrollable(
+ self.master,
+ "Config Backup Manager",
+ width=0,
+ height=None,
+ max_width_ratio=0.6,
+ max_height_ratio=0.8
+ )
+
+ # Main content
+ header_frame = tk.Frame(scrollable_frame)
+ header_frame.pack(fill=tk.X, padx=20, pady=(20, 10))
+
+ tk.Label(header_frame, text="Configuration Backup Manager",
+ font=('TkDefaultFont', 14, 'bold')).pack(anchor=tk.W)
+
+ tk.Label(header_frame,
+ text="Select a backup to restore or manage your configuration backups.",
+ font=('TkDefaultFont', 10), fg='gray').pack(anchor=tk.W, pady=(5, 0))
+
+ # Info section
+ info_frame = tk.LabelFrame(scrollable_frame, text="Backup Information", padx=10, pady=10)
+ info_frame.pack(fill=tk.X, padx=20, pady=(0, 10))
+
+ info_text = f"📁 Backup Location: {backup_dir}\n📊 Total Backups: {len(backups)}"
+ tk.Label(info_frame, text=info_text, font=('TkDefaultFont', 10),
+ fg='#333', justify=tk.LEFT).pack(anchor=tk.W)
+
+ # Backup list section
+ list_frame = tk.LabelFrame(scrollable_frame, text="Available Backups (Newest First)", padx=10, pady=10)
+ list_frame.pack(fill=tk.BOTH, expand=True, padx=20, pady=(0, 10))
+
+ # Create treeview for better display
+ columns = ('timestamp', 'filename', 'size')
+ tree = ttk.Treeview(list_frame, columns=columns, show='headings', height=8)
+
+ # Define headings
+ tree.heading('timestamp', text='Date & Time')
+ tree.heading('filename', text='Backup File')
+ tree.heading('size', text='Size')
+
+ # Configure column widths
+ tree.column('timestamp', width=150, anchor='center')
+ tree.column('filename', width=200)
+ tree.column('size', width=80, anchor='center')
+
+ # Add scrollbars for treeview
+ v_scrollbar = ttk.Scrollbar(list_frame, orient='vertical', command=tree.yview)
+ h_scrollbar = ttk.Scrollbar(list_frame, orient='horizontal', command=tree.xview)
+ tree.configure(yscrollcommand=v_scrollbar.set, xscrollcommand=h_scrollbar.set)
+
+ # Pack treeview and scrollbars
+ tree.pack(side='left', fill='both', expand=True)
+ v_scrollbar.pack(side='right', fill='y')
+ h_scrollbar.pack(side='bottom', fill='x')
+
+ # Populate treeview with backup information
+ backup_items = []
+ for backup in backups:
+ backup_path = os.path.join(backup_dir, backup)
+
+ # Extract timestamp from filename
+ try:
+ timestamp_part = backup.replace("config_", "").replace(".json.bak", "")
+ formatted_time = time.strftime("%Y-%m-%d %H:%M:%S",
+ time.strptime(timestamp_part, "%Y%m%d_%H%M%S"))
+ except:
+ formatted_time = "Unknown"
+
+ # Get file size
+ try:
+ size_bytes = os.path.getsize(backup_path)
+ if size_bytes < 1024:
+ size_str = f"{size_bytes} B"
+ elif size_bytes < 1024 * 1024:
+ size_str = f"{size_bytes // 1024} KB"
+ else:
+ size_str = f"{size_bytes // (1024 * 1024)} MB"
+ except:
+ size_str = "Unknown"
+
+ # Insert into treeview
+ item_id = tree.insert('', 'end', values=(formatted_time, backup, size_str))
+ backup_items.append((item_id, backup, formatted_time))
+
+ # Select first item by default
+ if backup_items:
+ tree.selection_set(backup_items[0][0])
+ tree.focus(backup_items[0][0])
+
+ # Action buttons frame
+ button_frame = tk.LabelFrame(scrollable_frame, text="Actions", padx=10, pady=10)
+ button_frame.pack(fill=tk.X, padx=20, pady=(0, 10))
+
+ # Create button layout
+ button_row1 = tk.Frame(button_frame)
+ button_row1.pack(fill=tk.X, pady=(0, 5))
+
+ button_row2 = tk.Frame(button_frame)
+ button_row2.pack(fill=tk.X)
+
+ def get_selected_backup():
+ """Get currently selected backup from treeview"""
+ selection = tree.selection()
+ if not selection:
+ return None
+
+ selected_item = selection[0]
+ for item_id, backup_filename, formatted_time in backup_items:
+ if item_id == selected_item:
+ return backup_filename, formatted_time
+ return None
+
+ def restore_selected():
+ selected = get_selected_backup()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select a backup to restore.")
+ return
+
+ selected_backup, formatted_time = selected
+ backup_path = os.path.join(backup_dir, selected_backup)
+
+ # Confirm restore
+ if messagebox.askyesno("Confirm Restore",
+ f"This will replace your current configuration with the backup from:\n\n"
+ f"{formatted_time}\n{selected_backup}\n\n"
+ f"A backup of your current config will be created first.\n\n"
+ f"Are you sure you want to continue?"):
+
+ try:
+ # Create backup of current config before restore
+ self._backup_config_file()
+
+ # Copy backup to config file
+ shutil.copy2(backup_path, CONFIG_FILE)
+
+ messagebox.showinfo("Restore Complete",
+ f"Configuration restored from: {selected_backup}\n\n"
+ f"Please restart the application for changes to take effect.")
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+
+ except Exception as e:
+ messagebox.showerror("Restore Failed", f"Failed to restore backup: {e}")
+
+ def delete_selected():
+ selected = get_selected_backup()
+ if not selected:
+ messagebox.showwarning("No Selection", "Please select a backup to delete.")
+ return
+
+ selected_backup, formatted_time = selected
+
+ if messagebox.askyesno("Confirm Delete",
+ f"Delete backup from {formatted_time}?\n\n{selected_backup}\n\n"
+ f"This action cannot be undone."):
+ try:
+ os.remove(os.path.join(backup_dir, selected_backup))
+
+ # Remove from treeview
+ selection = tree.selection()
+ if selection:
+ tree.delete(selection[0])
+
+ # Update backup items list
+ backup_items[:] = [(item_id, backup, time_str)
+ for item_id, backup, time_str in backup_items
+ if backup != selected_backup]
+
+ messagebox.showinfo("Deleted", "Backup deleted successfully.")
+ except Exception as e:
+ messagebox.showerror("Delete Failed", f"Failed to delete backup: {e}")
+
+ def create_new_backup():
+ """Create a new manual backup"""
+ try:
+ self._backup_config_file()
+ messagebox.showinfo("Backup Created", "New configuration backup created successfully!")
+ # Refresh the dialog
+ dialog._cleanup_scrolling()
+ dialog.destroy()
+ self._manual_restore_config() # Reopen with updated list
+ except Exception as e:
+ messagebox.showerror("Backup Failed", f"Failed to create backup: {e}")
+
+ def open_backup_folder():
+ """Open backup folder in file explorer"""
+ self._open_backup_folder()
+
+ # Primary action buttons (Row 1)
+ tb.Button(button_row1, text="✅ Restore Selected",
+ command=restore_selected, bootstyle="success",
+ width=20).pack(side=tk.LEFT, padx=(0, 10))
+
+ tb.Button(button_row1, text="💾 Create New Backup",
+ command=create_new_backup, bootstyle="primary-outline",
+ width=20).pack(side=tk.LEFT, padx=(0, 10))
+
+ tb.Button(button_row1, text="📁 Open Folder",
+ command=open_backup_folder, bootstyle="info-outline",
+ width=20).pack(side=tk.LEFT)
+
+ # Secondary action buttons (Row 2)
+ tb.Button(button_row2, text="🗑️ Delete Selected",
+ command=delete_selected, bootstyle="danger-outline",
+ width=20).pack(side=tk.LEFT, padx=(0, 10))
+
+ tb.Button(button_row2, text="❌ Close",
+ command=lambda: [dialog._cleanup_scrolling(), dialog.destroy()],
+ bootstyle="secondary",
+ width=20).pack(side=tk.RIGHT)
+
+ # Auto-resize and show dialog
+ self.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.7, max_height_ratio=0.9)
+
+ # Handle window close
+ dialog.protocol("WM_DELETE_WINDOW", lambda: [dialog._cleanup_scrolling(), dialog.destroy()])
+
+ except Exception as e:
+ messagebox.showerror("Error", f"Failed to open backup restore dialog: {e}")
+
+ def _ensure_executor(self):
+ """Ensure a ThreadPoolExecutor exists and matches configured worker count.
+ Also updates EXTRACTION_WORKERS environment variable.
+ """
+ try:
+ workers = 1
+ try:
+ workers = int(self.extraction_workers_var.get()) if self.enable_parallel_extraction_var.get() else 1
+ except Exception:
+ workers = 1
+ if workers < 1:
+ workers = 1
+ os.environ["EXTRACTION_WORKERS"] = str(workers)
+
+ # If executor exists with same worker count, keep it
+ if getattr(self, 'executor', None) and getattr(self, '_executor_workers', None) == workers:
+ return
+
+ # If executor exists but tasks are running, don't recreate to avoid disruption
+ active = any([
+ getattr(self, 'translation_future', None) and not self.translation_future.done(),
+ getattr(self, 'glossary_future', None) and not self.glossary_future.done(),
+ getattr(self, 'epub_future', None) and not self.epub_future.done(),
+ getattr(self, 'qa_future', None) and not self.qa_future.done(),
+ ])
+ if getattr(self, 'executor', None) and active:
+ self._executor_workers = workers # Remember desired workers for later
+ return
+
+ # Safe to (re)create
+ if getattr(self, 'executor', None):
+ try:
+ self.executor.shutdown(wait=False)
+ except Exception:
+ pass
+ self.executor = None
+
+ self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=workers, thread_name_prefix="Glossarion")
+ self._executor_workers = workers
+ except Exception as e:
+ try:
+ print(f"Executor setup failed: {e}")
+ except Exception:
+ pass
+
+ def log_debug(self, message):
+ self.append_log(f"[DEBUG] {message}")
+
+if __name__ == "__main__":
+ import time
+ # Ensure console encoding can handle emojis/Unicode in frozen exe environments
+ try:
+ import io, sys as _sys
+ if hasattr(_sys.stdout, 'reconfigure'):
+ try:
+ _sys.stdout.reconfigure(encoding='utf-8', errors='ignore')
+ _sys.stderr.reconfigure(encoding='utf-8', errors='ignore')
+ except Exception:
+ pass
+ else:
+ try:
+ _sys.stdout = io.TextIOWrapper(_sys.stdout.buffer, encoding='utf-8', errors='ignore')
+ _sys.stderr = io.TextIOWrapper(_sys.stderr.buffer, encoding='utf-8', errors='ignore')
+ except Exception:
+ pass
+ except Exception:
+ pass
+
+ print("🚀 Starting Glossarion v4.8.5...")
+
+ # Initialize splash screen
+ splash_manager = None
+ try:
+ from splash_utils import SplashManager
+ splash_manager = SplashManager()
+ splash_started = splash_manager.start_splash()
+
+ if splash_started:
+ splash_manager.update_status("Loading theme framework...")
+ time.sleep(0.1)
+ except Exception as e:
+ print(f"⚠️ Splash screen failed: {e}")
+ splash_manager = None
+
+ try:
+ if splash_manager:
+ splash_manager.update_status("Loading UI framework...")
+ time.sleep(0.08)
+
+ # Import ttkbootstrap while splash is visible
+ import ttkbootstrap as tb
+ from ttkbootstrap.constants import *
+
+ # REAL module loading during splash screen with gradual progression
+ if splash_manager:
+ # Create a custom callback function for splash updates
+ def splash_callback(message):
+ if splash_manager and splash_manager.splash_window:
+ splash_manager.update_status(message)
+ splash_manager.splash_window.update()
+ time.sleep(0.09)
+
+ # Actually load modules during splash with real feedback
+ splash_callback("Loading translation modules...")
+
+ # Import and test each module for real
+ translation_main = translation_stop_flag = translation_stop_check = None
+ glossary_main = glossary_stop_flag = glossary_stop_check = None
+ fallback_compile_epub = scan_html_folder = None
+
+ modules_loaded = 0
+ total_modules = 4
+
+ # Load TranslateKRtoEN
+ splash_callback("Loading translation engine...")
+ try:
+ splash_callback("Validating translation engine...")
+ import TransateKRtoEN
+ if hasattr(TransateKRtoEN, 'main') and hasattr(TransateKRtoEN, 'set_stop_flag'):
+ from TransateKRtoEN import main as translation_main, set_stop_flag as translation_stop_flag, is_stop_requested as translation_stop_check
+ modules_loaded += 1
+ splash_callback("✅ translation engine loaded")
+ else:
+ splash_callback("⚠️ translation engine incomplete")
+ except Exception as e:
+ splash_callback("❌ translation engine failed")
+ print(f"Warning: Could not import TransateKRtoEN: {e}")
+
+ # Load extract_glossary_from_epub
+ splash_callback("Loading glossary extractor...")
+ try:
+ splash_callback("Validating glossary extractor...")
+ import extract_glossary_from_epub
+ if hasattr(extract_glossary_from_epub, 'main') and hasattr(extract_glossary_from_epub, 'set_stop_flag'):
+ from extract_glossary_from_epub import main as glossary_main, set_stop_flag as glossary_stop_flag, is_stop_requested as glossary_stop_check
+ modules_loaded += 1
+ splash_callback("✅ glossary extractor loaded")
+ else:
+ splash_callback("⚠️ glossary extractor incomplete")
+ except Exception as e:
+ splash_callback("❌ glossary extractor failed")
+ print(f"Warning: Could not import extract_glossary_from_epub: {e}")
+
+ # Load epub_converter
+ splash_callback("Loading EPUB converter...")
+ try:
+ import epub_converter
+ if hasattr(epub_converter, 'fallback_compile_epub'):
+ from epub_converter import fallback_compile_epub
+ modules_loaded += 1
+ splash_callback("✅ EPUB converter loaded")
+ else:
+ splash_callback("⚠️ EPUB converter incomplete")
+ except Exception as e:
+ splash_callback("❌ EPUB converter failed")
+ print(f"Warning: Could not import epub_converter: {e}")
+
+ # Load scan_html_folder
+ splash_callback("Loading QA scanner...")
+ try:
+ import scan_html_folder
+ if hasattr(scan_html_folder, 'scan_html_folder'):
+ from scan_html_folder import scan_html_folder
+ modules_loaded += 1
+ splash_callback("✅ QA scanner loaded")
+ else:
+ splash_callback("⚠️ QA scanner incomplete")
+ except Exception as e:
+ splash_callback("❌ QA scanner failed")
+ print(f"Warning: Could not import scan_html_folder: {e}")
+
+ # Final status with pause for visibility
+ splash_callback("Finalizing module initialization...")
+ if modules_loaded == total_modules:
+ splash_callback("✅ All modules loaded successfully")
+ else:
+ splash_callback(f"⚠️ {modules_loaded}/{total_modules} modules loaded")
+
+ # Store loaded modules globally for GUI access
+ import translator_gui
+ translator_gui.translation_main = translation_main
+ translator_gui.translation_stop_flag = translation_stop_flag
+ translator_gui.translation_stop_check = translation_stop_check
+ translator_gui.glossary_main = glossary_main
+ translator_gui.glossary_stop_flag = glossary_stop_flag
+ translator_gui.glossary_stop_check = glossary_stop_check
+ translator_gui.fallback_compile_epub = fallback_compile_epub
+ translator_gui.scan_html_folder = scan_html_folder
+
+ if splash_manager:
+ splash_manager.update_status("Creating main window...")
+ time.sleep(0.07)
+
+ # Extra pause to show "Ready!" before closing
+ splash_manager.update_status("Ready!")
+ time.sleep(0.1)
+ splash_manager.close_splash()
+
+ # Create main window (modules already loaded)
+ root = tb.Window(themename="darkly")
+
+ # CRITICAL: Hide window immediately to prevent white flash
+ root.withdraw()
+
+ # Initialize the app (modules already available)
+ app = TranslatorGUI(root)
+
+ # Mark modules as already loaded to skip lazy loading
+ app._modules_loaded = True
+ app._modules_loading = False
+
+ # CRITICAL: Let all widgets and theme fully initialize
+ root.update_idletasks()
+
+ # CRITICAL: Now show the window after everything is ready
+ root.deiconify()
+
+ print("✅ Ready to use!")
+
+ # Start main loop
+ root.mainloop()
+
+ except Exception as e:
+ print(f"❌ Failed to start application: {e}")
+ if splash_manager:
+ splash_manager.close_splash()
+ import traceback
+ traceback.print_exc()
+ sys.exit(1)
+
+ finally:
+ if splash_manager:
+ try:
+ splash_manager.close_splash()
+ except:
+ pass
\ No newline at end of file
diff --git a/txt_processor.py b/txt_processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..cecc483ed59d5ed158a9ff5640423f52fb225817
--- /dev/null
+++ b/txt_processor.py
@@ -0,0 +1,304 @@
+# txt_processor.py
+import os
+import re
+import json
+from typing import List, Tuple, Dict
+from bs4 import BeautifulSoup
+from chapter_splitter import ChapterSplitter
+from decimal import Decimal
+import hashlib
+
+class TextFileProcessor:
+ """Process plain text files for translation"""
+
+ def __init__(self, file_path: str, output_dir: str):
+ self.file_path = file_path
+ self.output_dir = output_dir
+ self.file_base = os.path.splitext(os.path.basename(file_path))[0]
+
+ # Initialize chapter splitter
+ model_name = os.getenv("MODEL", "gpt-3.5-turbo")
+ self.chapter_splitter = ChapterSplitter(model_name=model_name)
+
+ def extract_chapters(self) -> List[Dict]:
+ """Extract chapters from text file"""
+ with open(self.file_path, 'r', encoding='utf-8') as f:
+ content = f.read()
+
+ # First, detect chapters in the content
+ raw_chapters = self._detect_chapters(content)
+
+ # Then, process each chapter for splitting if needed
+ final_chapters = self._process_chapters_for_splitting(raw_chapters)
+
+ print(f"📚 Extracted {len(final_chapters)} total chunks from {len(raw_chapters)} detected chapters")
+ return final_chapters
+
+ def _detect_chapters(self, content: str) -> List[Dict]:
+ """Detect chapter boundaries in the text"""
+ chapters = []
+
+ # Chapter detection patterns
+ chapter_patterns = [
+ # English patterns
+ (r'^Chapter\s+(\d+).*$', 'chapter'),
+ (r'^CHAPTER\s+(\d+).*$', 'chapter'),
+ (r'^Ch\.\s*(\d+).*$', 'chapter'),
+ # Numbered sections
+ (r'^(\d+)\.\s+(.*)$', 'numbered'),
+ (r'^Part\s+(\d+).*$', 'part'),
+ # Scene breaks (these don't have numbers)
+ (r'^\*\s*\*\s*\*.*$', 'break'),
+ (r'^---+.*$', 'break'),
+ (r'^===+.*$', 'break'),
+ ]
+
+ # Find all chapter markers and their positions
+ chapter_breaks = []
+ lines = content.split('\n')
+
+ for line_num, line in enumerate(lines):
+ for pattern, pattern_type in chapter_patterns:
+ match = re.match(pattern, line.strip())
+ if match:
+ chapter_breaks.append({
+ 'line_num': line_num,
+ 'line': line,
+ 'type': pattern_type,
+ 'match': match
+ })
+ break
+
+ if not chapter_breaks:
+ # No chapter markers found, treat as single chapter
+ print(f"No chapter markers found in {self.file_base}, treating as single document")
+ # FIX: Use "Section 1" instead of filename to avoid number extraction issues
+ chapters = [{
+ 'num': 1,
+ 'title': 'Section 1', # Changed from self.file_base
+ 'content': content
+ }]
+ else:
+ # Split content by chapter markers
+ print(f"Found {len(chapter_breaks)} chapter markers in {self.file_base}")
+
+ for i, chapter_break in enumerate(chapter_breaks):
+ # Determine chapter number and title
+ chapter_num, chapter_title = self._extract_chapter_info(chapter_break, i)
+
+ # Get content for this chapter
+ start_line = chapter_break['line_num'] + 1 # Start after the chapter marker
+
+ # Find where this chapter ends
+ if i < len(chapter_breaks) - 1:
+ end_line = chapter_breaks[i + 1]['line_num']
+ else:
+ end_line = len(lines)
+
+ # Extract chapter content
+ chapter_lines = lines[start_line:end_line]
+ chapter_content = '\n'.join(chapter_lines).strip()
+
+ if chapter_content: # Only add if there's actual content
+ chapters.append({
+ 'num': chapter_num,
+ 'title': chapter_title,
+ 'content': chapter_content
+ })
+
+ return chapters
+
+ def _extract_chapter_info(self, chapter_break: Dict, index: int) -> Tuple[int, str]:
+ """Extract chapter number and title from a chapter break"""
+ if chapter_break['type'] == 'break':
+ # Scene breaks don't have numbers
+ chapter_num = index + 1
+ chapter_title = f"Section {chapter_num}"
+ else:
+ # Try to extract number from match
+ match_groups = chapter_break['match'].groups()
+ if match_groups and match_groups[0]: # Check if group exists AND is not empty
+ try:
+ # Strip whitespace and check if it's a valid number
+ num_str = match_groups[0].strip()
+ if num_str: # Only try to convert if not empty
+ chapter_num = int(num_str)
+ chapter_title = chapter_break['line'].strip()
+ else:
+ # Empty match group, use index
+ chapter_num = index + 1
+ chapter_title = chapter_break['line'].strip()
+ except (ValueError, IndexError):
+ # Failed to convert to int, use index
+ chapter_num = index + 1
+ chapter_title = chapter_break['line'].strip()
+ else:
+ # No match groups or empty match
+ chapter_num = index + 1
+ chapter_title = chapter_break['line'].strip()
+
+ return chapter_num, chapter_title
+
+ def _process_chapters_for_splitting(self, raw_chapters: List[Dict]) -> List[Dict]:
+ """Process chapters and split them if they exceed token limits"""
+ final_chapters = []
+
+ # Calculate based on OUTPUT token limits
+ max_output_tokens = int(os.getenv("MAX_OUTPUT_TOKENS", "8192"))
+ compression_factor = float(os.getenv("COMPRESSION_FACTOR", "0.8"))
+ safety_margin_output = 500
+
+ # Calculate chunk size based on output limit
+ available_tokens = int((max_output_tokens - safety_margin_output) / compression_factor)
+ available_tokens = max(available_tokens, 1000)
+
+ print(f"📊 Text file chunk size: {available_tokens:,} tokens (based on {max_output_tokens:,} output limit, compression: {compression_factor})")
+
+ for chapter_data in raw_chapters:
+ # Convert chapter content to HTML format
+ chapter_html = self._text_to_html(chapter_data['content'])
+ chapter_tokens = self.chapter_splitter.count_tokens(chapter_html)
+
+ if chapter_tokens > available_tokens:
+ # Chapter needs splitting
+ print(f"Chapter {chapter_data['num']} ({chapter_data['title']}) has {chapter_tokens} tokens, splitting...")
+
+ chunks = self.chapter_splitter.split_chapter(chapter_html, available_tokens)
+
+ # Add each chunk as a separate chapter
+ for chunk_html, chunk_idx, total_chunks in chunks:
+ chunk_title = chapter_data['title']
+ if total_chunks > 1:
+ chunk_title = f"{chapter_data['title']} (Part {chunk_idx}/{total_chunks})"
+
+ # Create float chapter numbers for chunks: 1.0, 1.1, 1.2, etc.
+ chunk_num = round(chapter_data['num'] + (chunk_idx - 1) * 0.1, 1)
+
+ final_chapters.append({
+ 'num': chunk_num,
+ 'title': chunk_title,
+ 'body': chunk_html,
+ 'filename': f"section_{int(chapter_data['num'])}_part{chunk_idx}.txt", # Changed to avoid using file_base
+ 'content_hash': self._generate_hash(chunk_html),
+ 'file_size': len(chunk_html),
+ 'has_images': False,
+ 'is_chunk': True,
+ 'chunk_info': {
+ 'chunk_idx': chunk_idx,
+ 'total_chunks': total_chunks,
+ 'original_chapter': chapter_data['num']
+ }
+ })
+ else:
+ # Chapter is small enough, add as-is
+ final_chapters.append({
+ 'num': chapter_data['num'], # Keep as integer for non-split chapters
+ 'title': chapter_data['title'],
+ 'body': chapter_html,
+ 'filename': f"section_{chapter_data['num']}.txt", # Changed to avoid using file_base
+ 'content_hash': self._generate_hash(chapter_html),
+ 'file_size': len(chapter_html),
+ 'has_images': False,
+ 'is_chunk': False
+ })
+
+ # Ensure we have at least one chapter
+ if not final_chapters:
+ # Fallback: create a single chapter with all content
+ all_content = '\n\n'.join(ch['content'] for ch in raw_chapters if ch.get('content'))
+ if not all_content and raw_chapters:
+ all_content = raw_chapters[0].get('content', '')
+
+ final_chapters.append({
+ 'num': 1,
+ 'title': 'Section 1', # Changed from self.file_base
+ 'body': self._text_to_html(all_content or 'Empty file'),
+ 'filename': 'section_1.txt', # Changed to avoid using file_base
+ 'content_hash': self._generate_hash(all_content or ''),
+ 'file_size': len(all_content or ''),
+ 'has_images': False,
+ 'is_chunk': False
+ })
+
+ return final_chapters
+
+ def _text_to_html(self, text: str) -> str:
+ """Convert plain text to HTML format"""
+ # Escape HTML characters
+ text = text.replace('&', '&')
+ text = text.replace('<', '<')
+ text = text.replace('>', '>')
+
+ # Split into paragraphs
+ paragraphs = text.split('\n\n')
+
+ # Wrap each paragraph in tags
+ html_parts = []
+ for para in paragraphs:
+ para = para.strip()
+ if para:
+ # Check if it's a chapter heading
+ if re.match(r'^(Chapter|CHAPTER|Ch\.|Part)\s+\d+', para):
+ html_parts.append(f'
{para}
')
+ else:
+ # Replace single newlines with
within paragraphs
+ para = para.replace('\n', '
\n')
+ html_parts.append(f'{para}
')
+
+ # Create a simple HTML structure
+ html = f"""
+
+ {self.file_base}
+
+
+
+ {''.join(html_parts)}
+
+"""
+
+ return html
+
+ def _generate_hash(self, content: str) -> str:
+ """Generate hash for content"""
+ return hashlib.md5(content.encode('utf-8')).hexdigest()
+
+ def save_original_structure(self):
+ """Save original text file structure info"""
+ metadata = {
+ 'source_file': os.path.basename(self.file_path),
+ 'type': 'text',
+ 'encoding': 'utf-8'
+ }
+
+ metadata_path = os.path.join(self.output_dir, 'metadata.json')
+ with open(metadata_path, 'w', encoding='utf-8') as f:
+ json.dump(metadata, f, ensure_ascii=False, indent=2)
+
+ def create_output_structure(self, translated_chapters: List[Tuple[str, str]]) -> str:
+ """Create output text file from translated chapters"""
+ # Sort chapters by filename to ensure correct order
+ sorted_chapters = sorted(translated_chapters, key=lambda x: x[0])
+
+ # Combine all content
+ all_content = []
+ for filename, content in sorted_chapters:
+ # Extract text from HTML
+ soup = BeautifulSoup(content, 'html.parser')
+ text_content = soup.get_text()
+
+ # Add chapter separator if needed
+ if len(all_content) > 0:
+ all_content.append('\n\n' + '='*50 + '\n\n')
+
+ all_content.append(text_content)
+
+ # Create output filename
+ output_filename = f"{self.file_base}_translated.txt"
+ output_path = os.path.join(self.output_dir, output_filename)
+
+ # Write the translated text
+ with open(output_path, 'w', encoding='utf-8') as f:
+ f.write(''.join(all_content))
+
+ print(f"✅ Created translated text file: {output_filename}")
+ return output_path
diff --git a/unified_api_client.py b/unified_api_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..63531b2fd424e12e75103d74f1bca1b06e91cbf0
--- /dev/null
+++ b/unified_api_client.py
@@ -0,0 +1,10210 @@
+# unified_api_client.py - REFACTORED with Enhanced Error Handling and Extended AI Model Support
+"""
+Key Design Principles:
+- The client handles API communication and returns accurate status
+- The client must save responses properly for duplicate detection
+- The client must return accurate finish_reason for truncation detection
+- The client must support cancellation for timeout handling
+- Enhanced Multi-Key Mode: Rotates API keys during exponential backoff on server errors (500, 502, 503, 504)
+ to avoid waiting on potentially problematic keys before trying alternatives
+
+Supported models and their prefixes (Updated July 2025):
+- OpenAI: gpt*, o1*, o3*, o4*, codex* (e.g., gpt-4, gpt-4o, gpt-4o-mini, gpt-4.5, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o3-pro, o4-mini)
+- Google: gemini*, palm*, bard* (e.g., gemini-2.0-flash-exp, gemini-2.5-pro, gemini-2.5-flash)
+- Anthropic: claude*, sonnet*, opus*, haiku* (e.g., claude-3.5-sonnet, claude-3.7-sonnet, claude-4-opus, claude-4-sonnet, claude-opus-4-20250514, claude-sonnet-4-20250514)
+- DeepSeek: deepseek* (e.g., deepseek-chat, deepseek-vl, deepseek-r1)
+- Mistral: mistral*, mixtral*, codestral*
+- Cohere: command*, cohere*, aya* (e.g., aya-vision, command-r7b)
+- AI21: j2*, jurassic*, jamba*
+- Together AI: llama*, together*, alpaca*, vicuna*, wizardlm*, openchat*
+- Perplexity: perplexity*, pplx*, sonar*
+- Replicate: replicate*
+- Yi (01.AI): yi* (e.g., yi-34b-chat-200k, yi-vl)
+- Qwen (Alibaba): qwen* (e.g., qwen2.5-vl)
+- Baichuan: baichuan*
+- Zhipu AI: glm*, chatglm*
+- Moonshot: moonshot*, kimi*
+- Groq: groq*, llama-groq*, mixtral-groq*
+- Baidu: ernie*
+- Tencent: hunyuan*
+- iFLYTEK: spark*
+- ByteDance: doubao*
+- MiniMax: minimax*, abab*
+- SenseNova: sensenova*, nova*
+- InternLM: intern*, internlm*
+- TII: falcon* (e.g., falcon-2-11b)
+- Microsoft: phi*, orca*
+- Azure: azure* (for Azure OpenAI deployments)
+- Aleph Alpha: luminous*
+- Databricks: dolly*
+- HuggingFace: starcoder*
+- Salesforce: codegen*
+- BigScience: bloom*
+- Meta: opt*, galactica*, llama2*, llama3*, llama4*, codellama*
+- xAI: grok* (e.g., grok-3, grok-vision)
+- Poe: poe/* (e.g., poe/claude-4-opus, poe/gpt-4.5, poe/Assistant)
+- OpenRouter: or/*, openrouter/* (e.g., or/anthropic/claude-4-opus, or/openai/gpt-4.5)
+- Fireworks AI: fireworks/* (e.g., fireworks/llama-v3-70b)
+
+ELECTRONHUB SUPPORT:
+ElectronHub is an API aggregator that provides access to multiple models.
+To use ElectronHub, prefix your model name with one of these:
+- eh/ (e.g., eh/yi-34b-chat-200k)
+- electronhub/ (e.g., electronhub/gpt-4.5)
+- electron/ (e.g., electron/claude-4-opus)
+
+ElectronHub allows you to access models from multiple providers using a single API key.
+
+POE SUPPORT:
+Poe by Quora provides access to multiple AI models through their platform.
+To use Poe, prefix your model name with 'poe/':
+- poe/claude-4-opus
+- poe/claude-4-sonnet
+- poe/gpt-4.5
+- poe/gpt-4.1
+- poe/Assistant
+- poe/gemini-2.5-pro
+
+OPENROUTER SUPPORT:
+OpenRouter is a unified interface for 300+ models from various providers.
+To use OpenRouter, prefix your model name with 'or/' or 'openrouter/':
+- or/anthropic/claude-4-opus
+- openrouter/openai/gpt-4.5
+- or/google/gemini-2.5-pro
+- or/meta-llama/llama-4-70b
+
+Environment Variables:
+- SEND_INTERVAL_SECONDS: Delay between API calls (respects GUI settings)
+- YI_API_BASE_URL: Custom endpoint for Yi models (optional)
+- ELECTRONHUB_API_URL: Custom ElectronHub endpoint (default: https://api.electronhub.ai/v1)
+- AZURE_OPENAI_ENDPOINT: Azure OpenAI endpoint (for azure* models)
+- AZURE_API_VERSION: Azure API version (default: 2024-02-01)
+- DATABRICKS_API_URL: Databricks workspace URL
+- SALESFORCE_API_URL: Salesforce API endpoint
+- OPENROUTER_REFERER: HTTP referer for OpenRouter (default: https://github.com/Shirochi-stack/Glossarion)
+- OPENROUTER_APP_NAME: App name for OpenRouter (default: Glossarion Translation)
+- POE_API_KEY: API key for Poe platform
+- GROQ_API_URL: Custom Groq endpoint (default: https://api.groq.com/openai/v1)
+- FIREWORKS_API_URL: Custom Fireworks AI endpoint (default: https://api.fireworks.ai/inference/v1)
+- DISABLE_GEMINI_SAFETY: Set to "true" to disable Gemini safety filters (respects GUI toggle)
+- XAI_API_URL: Custom xAI endpoint (default: https://api.x.ai/v1)
+- DEEPSEEK_API_URL: Custom DeepSeek endpoint (default: https://api.deepseek.com/v1)
+
+SAFETY SETTINGS:
+The client respects the GUI's "Disable Gemini API Safety Filters" toggle via the
+DISABLE_GEMINI_SAFETY environment variable. When enabled, it applies API-level safety
+settings where available:
+
+- Gemini: Sets all harm categories to BLOCK_NONE (most permissive)
+- OpenRouter: Disables safe mode via X-Safe-Mode header
+- Poe: Disables safe mode via safe_mode parameter
+- Other OpenAI-compatible providers: Sets moderation=false where supported
+
+Note: Not all providers support API-level safety toggles. OpenAI and Anthropic APIs
+do not have direct safety filter controls. The client only applies settings that are
+officially supported by each provider's API.
+
+Note: Many Chinese model providers (Yi, Qwen, Baichuan, etc.) may require
+API keys from their respective platforms. Some endpoints might need adjustment
+based on your region or deployment.
+"""
+import os
+import json
+import requests
+from requests.adapters import HTTPAdapter
+try:
+ from urllib3.util.retry import Retry
+except Exception:
+ Retry = None
+from dataclasses import dataclass
+from typing import Optional, Dict, Any, Tuple, List
+import logging
+import re
+import base64
+import contextlib
+from PIL import Image
+import io
+import time
+import random
+import csv
+from datetime import datetime
+import traceback
+import hashlib
+import html
+try:
+ from multi_api_key_manager import APIKeyPool, APIKeyEntry, RateLimitCache
+except ImportError:
+ try:
+ from .multi_api_key_manager import APIKeyPool, APIKeyEntry, RateLimitCache
+ except ImportError:
+ # Fallback classes if module not available
+ class APIKeyPool:
+ def __init__(self): pass
+ class APIKeyEntry:
+ def __init__(self): pass
+ class RateLimitCache:
+ def __init__(self): pass
+import threading
+import uuid
+from threading import RLock
+from collections import defaultdict
+logger = logging.getLogger(__name__)
+
+# IMPORTANT: This client respects GUI settings via environment variables:
+# - SEND_INTERVAL_SECONDS: Delay between API calls (set by GUI)
+# All API providers INCLUDING ElectronHub respect this setting for proper GUI integration
+
+# Note: For Yi models through ElectronHub, use eh/yi-34b-chat-200k format
+# For direct Yi API access, use yi-34b-chat-200k format
+
+# Set up logging
+logger = logging.getLogger(__name__)
+
+# Enable HTTP request logging for debugging
+def setup_http_logging():
+ """Enable detailed HTTP request/response logging for debugging"""
+ import logging
+
+ # Enable httpx logging (used by OpenAI SDK)
+ httpx_logger = logging.getLogger("httpx")
+ httpx_logger.setLevel(logging.INFO)
+
+ # Enable requests logging (fallback HTTP calls)
+ requests_logger = logging.getLogger("requests.packages.urllib3")
+ requests_logger.setLevel(logging.INFO)
+
+ # Enable OpenAI SDK logging
+ openai_logger = logging.getLogger("openai")
+ openai_logger.setLevel(logging.DEBUG)
+
+ # Create console handler if not exists
+ if not any(isinstance(h, logging.StreamHandler) for h in logging.root.handlers):
+ console_handler = logging.StreamHandler()
+ console_handler.setLevel(logging.INFO)
+ formatter = logging.Formatter('%(levelname)s:%(name)s:%(message)s')
+ console_handler.setFormatter(formatter)
+
+ httpx_logger.addHandler(console_handler)
+ requests_logger.addHandler(console_handler)
+ openai_logger.addHandler(console_handler)
+
+ # Prevent duplicate logs
+ httpx_logger.propagate = False
+ requests_logger.propagate = False
+ openai_logger.propagate = False
+
+# Enable HTTP logging on module import
+setup_http_logging()
+
+# OpenAI SDK
+try:
+ import openai
+ from openai import OpenAIError
+except ImportError:
+ openai = None
+ class OpenAIError(Exception): pass
+try:
+ import httpx
+ from httpx import HTTPStatusError
+except ImportError:
+ httpx = None
+ class HTTPStatusError(Exception): pass
+
+# Gemini SDK
+try:
+ from google import genai
+ from google.genai import types
+ GENAI_AVAILABLE = True
+except ImportError:
+ genai = None
+ types = None
+ GENAI_AVAILABLE = False
+
+# Anthropic SDK (optional - can use requests if not installed)
+try:
+ import anthropic
+except ImportError:
+ anthropic = None
+
+# Cohere SDK (optional)
+try:
+ import cohere
+except ImportError:
+ cohere = None
+
+# Mistral SDK (optional)
+try:
+ from mistralai.client import MistralClient
+ from mistralai.models.chat_completion import ChatMessage
+except ImportError:
+ MistralClient = None
+
+# Google Vertex AI API Cloud
+try:
+ from google.cloud import aiplatform
+ from google.oauth2 import service_account
+ import vertexai
+ VERTEX_AI_AVAILABLE = True
+except ImportError:
+ VERTEX_AI_AVAILABLE = False
+ print("Vertex AI SDK not installed. Install with: pip install google-cloud-aiplatform")
+
+try:
+ import deepl
+ DEEPL_AVAILABLE = True
+except ImportError:
+ deepl = None
+ DEEPL_AVAILABLE = False
+
+try:
+ from google.cloud import translate_v2 as google_translate
+ GOOGLE_TRANSLATE_AVAILABLE = True
+except ImportError:
+ google_translate = None
+ GOOGLE_TRANSLATE_AVAILABLE = False
+
+from functools import lru_cache
+from datetime import datetime, timedelta
+
+
+@dataclass
+class UnifiedResponse:
+ """Standardized response format for all API providers"""
+ content: str
+ finish_reason: Optional[str] = None
+ usage: Optional[Dict[str, int]] = None
+ raw_response: Optional[Any] = None
+ error_details: Optional[Dict[str, Any]] = None
+
+ @property
+ def is_truncated(self) -> bool:
+ """Check if the response was truncated
+
+ IMPORTANT: This is used by retry logic to detect when to retry with more tokens
+ """
+ return self.finish_reason in ['length', 'max_tokens', 'stop_sequence_limit', 'truncated', 'incomplete']
+
+ @property
+ def is_complete(self) -> bool:
+ """Check if the response completed normally"""
+ return self.finish_reason in ['stop', 'complete', 'end_turn', 'finished', None]
+
+ @property
+ def is_error(self) -> bool:
+ """Check if the response is an error"""
+ return self.finish_reason == 'error' or bool(self.error_details)
+
+class UnifiedClientError(Exception):
+ """Generic exception for UnifiedClient errors."""
+ def __init__(self, message, error_type=None, http_status=None, details=None):
+ super().__init__(message)
+ self.error_type = error_type
+ self.http_status = http_status
+ self.details = details
+
+class UnifiedClient:
+ # ----- Helper methods to reduce duplication -----
+ @contextlib.contextmanager
+ def _model_lock(self):
+ """Context manager for thread-safe model access"""
+ if hasattr(self, '_instance_model_lock') and self._instance_model_lock is not None:
+ with self._instance_model_lock:
+ yield
+ else:
+ # Fallback - create a temporary lock if needed
+ if not hasattr(self, '_temp_model_lock'):
+ self._temp_model_lock = threading.RLock()
+ with self._temp_model_lock:
+ yield
+ def _get_send_interval(self) -> float:
+ try:
+ return float(os.getenv("SEND_INTERVAL_SECONDS", "2"))
+ except Exception:
+ return 2.0
+
+ def _debug_log(self, message: str) -> None:
+ """Print debug logs unless in cleanup/stop state or quiet mode.
+ Suppresses noisy logs when the operation is cancelled or in cleanup.
+ Honours QUIET_LOGS=1 environment toggle.
+ """
+ try:
+ if getattr(self, '_in_cleanup', False):
+ return
+ if getattr(self, '_cancelled', False):
+ return
+ # Some call sites expose a stop check
+ if hasattr(self, '_is_stop_requested') and callable(getattr(self, '_is_stop_requested')):
+ try:
+ if self._is_stop_requested():
+ return
+ except Exception:
+ pass
+ if os.getenv('QUIET_LOGS', '0') == '1':
+ return
+ print(message)
+ except Exception:
+ # Best-effort logging; swallow any print failures
+ try:
+ print(message)
+ except Exception:
+ pass
+
+ def _safe_len(self, obj, context="unknown"):
+ """Safely get length of an object with better error reporting"""
+ try:
+ if obj is None:
+ print(f"⚠️ Warning: Attempting to get length of None in context: {context}")
+ return 0
+ return len(obj)
+ except TypeError as e:
+ print(f"❌ TypeError in _safe_len for context '{context}': {e}")
+ print(f"❌ Object type: {type(obj)}, Object value: {obj}")
+ return 0
+ except Exception as e:
+ print(f"❌ Unexpected error in _safe_len for context '{context}': {e}")
+ return 0
+
+ def _extract_first_image_base64(self, messages) -> Optional[str]:
+ if messages is None:
+ return None
+ for msg in messages:
+ if msg is None:
+ continue
+ content = msg.get('content')
+ if isinstance(content, list):
+ for part in content:
+ if isinstance(part, dict) and part.get('type') == 'image_url':
+ url = part.get('image_url', {}).get('url', '')
+ if isinstance(url, str):
+ if url.startswith('data:') and ',' in url:
+ return url.split(',', 1)[1]
+ return url
+ return None
+
+
+ def _get_timeout_config(self) -> Tuple[bool, int]:
+ enabled = os.getenv("RETRY_TIMEOUT", "0") == "1"
+ window = int(os.getenv("CHUNK_TIMEOUT", "180"))
+ return enabled, window
+ def _with_attempt_suffix(self, payload_name: str, response_name: str, request_id: str, attempt: int, is_image: bool) -> Tuple[str, str]:
+ base_payload, ext_payload = os.path.splitext(payload_name)
+ base_response, ext_response = os.path.splitext(response_name)
+ unique_suffix = f"_{request_id}_imgA{attempt}" if is_image else f"_{request_id}_A{attempt}"
+ return f"{base_payload}{unique_suffix}{ext_payload}", f"{base_response}{unique_suffix}{ext_response}"
+
+ def _maybe_retry_main_key_on_prohibited(self, messages, temperature, max_tokens, max_completion_tokens, context, request_id=None, image_data=None):
+ if not (self._multi_key_mode and getattr(self, 'original_api_key', None) and getattr(self, 'original_model', None)):
+ return None
+ try:
+ return self._retry_with_main_key(
+ messages, temperature, max_tokens, max_completion_tokens, context,
+ request_id=request_id, image_data=image_data
+ )
+ except Exception:
+ return None
+
+ def _detect_safety_filter(self, messages, extracted_content: str, finish_reason: Optional[str], response: Any, provider: str) -> bool:
+ # Heuristic patterns consolidated from previous branches
+ # 1) Suspicious finish reasons with empty content
+ if not extracted_content and finish_reason in ['length', 'stop', 'max_tokens', None]:
+ return True
+ # 2) Safety indicators in raw response/error details
+ response_str = ""
+ if response is not None:
+ if hasattr(response, 'raw_response') and response.raw_response is not None:
+ response_str = str(response.raw_response).lower()
+ elif hasattr(response, 'error_details') and response.error_details is not None:
+ response_str = str(response.error_details).lower()
+ else:
+ response_str = str(response).lower()
+ safety_indicators = [
+ 'safety', 'blocked', 'prohibited', 'harmful', 'inappropriate',
+ 'refused', 'content_filter', 'content policy', 'violation',
+ 'cannot assist', 'unable to process', 'against guidelines',
+ 'ethical', 'responsible ai', 'harm_category', 'nsfw',
+ 'adult content', 'explicit', 'violence', 'disturbing'
+ ]
+ if any(ind in response_str for ind in safety_indicators):
+ return True
+ # 3) Safety phrases in extracted content
+ if extracted_content:
+ content_lower = extracted_content.lower()
+ safety_phrases = [
+ 'blocked', 'safety', 'cannot', 'unable', 'prohibited',
+ 'content filter', 'refused', 'inappropriate', 'i cannot',
+ "i can't", "i'm not able", "not able to", "against my",
+ 'content policy', 'guidelines', 'ethical',
+ 'analyze this image', 'process this image', 'describe this image', 'nsfw'
+ ]
+ if any(p in content_lower for p in safety_phrases):
+ return True
+ # 4) Provider-specific empty behavior
+ if provider in ['openai', 'azure', 'electronhub', 'openrouter', 'poe', 'gemini']:
+ if not extracted_content and finish_reason != 'error':
+ return True
+ # 5) Suspiciously short output vs long input - FIX: Add None checks
+ if extracted_content and len(extracted_content) < 50:
+ # FIX: Add None check for messages
+ if messages is not None:
+ input_length = 0
+ for m in messages:
+ if m is not None and m.get('role') == 'user':
+ content = m.get('content', '')
+ # FIX: Add None check for content
+ if content is not None:
+ input_length += len(str(content))
+ if input_length > 200 and any(w in extracted_content.lower() for w in ['cannot', 'unable', 'sorry', 'assist']):
+ return True
+ return False
+
+ def _finalize_empty_response(self, messages, context, response, extracted_content: str, finish_reason: Optional[str], provider: str, request_type: str, start_time: float) -> Tuple[str, str]:
+ is_safety = self._detect_safety_filter(messages, extracted_content, finish_reason, response, provider)
+ # Always save failure snapshot and log truncation details
+ self._save_failed_request(messages, f"Empty {request_type} response from {getattr(self, 'client_type', 'unknown')}", context, response)
+ error_details = getattr(response, 'error_details', None)
+ if is_safety:
+ error_details = {
+ 'likely_safety_filter': True,
+ 'original_finish_reason': finish_reason,
+ 'provider': getattr(self, 'client_type', None),
+ 'model': self.model,
+ 'key_identifier': getattr(self, 'key_identifier', None),
+ 'request_type': request_type
+ }
+ self._log_truncation_failure(
+ messages=messages,
+ response_content=extracted_content or "",
+ finish_reason='content_filter' if is_safety else (finish_reason or 'error'),
+ context=context,
+ error_details=error_details
+ )
+ # Stats
+ self._track_stats(context, False, f"empty_{request_type}_response", time.time() - start_time)
+ # Fallback message
+ if is_safety:
+ fb_reason = f"image_safety_filter_{provider}" if request_type == 'image' else f"safety_filter_{provider}"
+ else:
+ fb_reason = "empty_image" if request_type == 'image' else "empty"
+ fallback = self._handle_empty_result(messages, context, getattr(response, 'error_details', fb_reason) if response else fb_reason)
+ return fallback, ('content_filter' if is_safety else 'error')
+
+ def _is_rate_limit_error(self, exc: Exception) -> bool:
+ s = str(exc).lower()
+ if hasattr(exc, 'error_type') and getattr(exc, 'error_type') == 'rate_limit':
+ return True
+ return ('429' in s) or ('rate limit' in s) or ('quota' in s)
+
+ def _compute_backoff(self, attempt: int, base: float, cap: float) -> float:
+ delay = (base * (2 ** attempt)) + random.uniform(0, 1)
+ return min(delay, cap)
+
+ def _normalize_token_params(self, max_tokens: Optional[int], max_completion_tokens: Optional[int]) -> Tuple[Optional[int], Optional[int]]:
+ if self._is_o_series_model():
+ mct = max_completion_tokens if max_completion_tokens is not None else (max_tokens or getattr(self, 'default_max_tokens', 8192))
+ return None, mct
+ else:
+ mt = max_tokens if max_tokens is not None else (max_completion_tokens or getattr(self, 'default_max_tokens', 8192))
+ return mt, None
+ def _apply_api_delay(self) -> None:
+ if getattr(self, '_in_cleanup', False):
+ # Suppress log in cleanup mode
+ # self._debug_log("⚡ Skipping API delay (cleanup mode)")
+ return
+ try:
+ api_delay = float(os.getenv("SEND_INTERVAL_SECONDS", "2"))
+ except Exception:
+ api_delay = 2.0
+ if api_delay > 0:
+ self._debug_log(f"⏳ Waiting {api_delay}s before next API call...")
+ time.sleep(api_delay)
+
+ def _set_idempotency_context(self, request_id: str, attempt: int) -> None:
+ tls = self._get_thread_local_client()
+ tls.idem_request_id = request_id
+ tls.idem_attempt = attempt
+
+ def _get_extraction_kwargs(self) -> dict:
+ ct = getattr(self, 'client_type', None)
+ if ct == 'gemini':
+ return {
+ 'supports_thinking': self._supports_thinking(),
+ 'thinking_budget': int(os.getenv("THINKING_BUDGET", "-1")),
+ }
+ return {}
+
+ def _is_rate_limit_error(self, exc: Exception) -> bool:
+ s = str(exc).lower()
+ if hasattr(exc, 'error_type') and getattr(exc, 'error_type') == 'rate_limit':
+ return True
+ return ('429' in s) or ('rate limit' in s) or ('quota' in s)
+
+ def _compute_backoff(self, attempt: int, base: float, cap: float) -> float:
+ delay = (base * (2 ** attempt)) + random.uniform(0, 1)
+ return min(delay, cap)
+
+ def _normalize_token_params(self, max_tokens: Optional[int], max_completion_tokens: Optional[int]) -> Tuple[Optional[int], Optional[int]]:
+ if self._is_o_series_model():
+ mct = max_completion_tokens if max_completion_tokens is not None else (max_tokens or getattr(self, 'default_max_tokens', 8192))
+ return None, mct
+ else:
+ mt = max_tokens if max_tokens is not None else (max_completion_tokens or getattr(self, 'default_max_tokens', 8192))
+ return mt, None
+ """
+ Unified client with fixed thread-safe multi-key support
+
+ Key improvements:
+ 1. Thread-local storage for API clients
+ 2. Proper key rotation per request
+ 3. Thread-safe rate limit handling
+ 4. Cleaner error handling and retry logic
+ 5. INSTANCE-BASED multi-key mode (not class-based)
+ """
+ # Thread safety for file operations
+ _file_write_lock = RLock()
+ _tracker_lock = RLock()
+ _model_lock = RLock()
+
+ # Class-level shared resources - properly initialized
+ _rate_limit_cache = None
+ _api_key_pool: Optional[APIKeyPool] = None
+ _pool_lock = threading.Lock()
+
+ # Request tracking
+ _global_request_counter = 0
+ _counter_lock = threading.Lock()
+
+ # Thread-local storage for clients and key assignments
+ _thread_local = threading.local()
+
+ # global stop flag
+ _global_cancelled = False
+
+ # Legacy tracking (for compatibility)
+ _key_assignments = {} # thread_id -> (key_index, key_identifier)
+ _assignment_lock = threading.Lock()
+
+ # Track displayed log messages to avoid spam
+ _displayed_messages = set()
+ _message_lock = threading.Lock()
+
+ # Your existing MODEL_PROVIDERS and other class variables
+ MODEL_PROVIDERS = {
+ 'vertex/': 'vertex_model_garden',
+ '@': 'vertex_model_garden',
+ 'gpt': 'openai',
+ 'o1': 'openai',
+ 'o3': 'openai',
+ 'o4': 'openai',
+ 'gemini': 'gemini',
+ 'claude': 'anthropic',
+ 'chutes': 'chutes',
+ 'chutes/': 'chutes',
+ 'sonnet': 'anthropic',
+ 'opus': 'anthropic',
+ 'haiku': 'anthropic',
+ 'deepseek': 'deepseek',
+ 'mistral': 'mistral',
+ 'mixtral': 'mistral',
+ 'codestral': 'mistral',
+ 'command': 'cohere',
+ 'cohere': 'cohere',
+ 'aya': 'cohere',
+ 'j2': 'ai21',
+ 'jurassic': 'ai21',
+ 'llama': 'together',
+ 'together': 'together',
+ 'perplexity': 'perplexity',
+ 'pplx': 'perplexity',
+ 'sonar': 'perplexity',
+ 'replicate': 'replicate',
+ 'yi': 'yi',
+ 'qwen': 'qwen',
+ 'baichuan': 'baichuan',
+ 'glm': 'zhipu',
+ 'chatglm': 'zhipu',
+ 'moonshot': 'moonshot',
+ 'kimi': 'moonshot',
+ 'groq': 'groq',
+ 'llama-groq': 'groq',
+ 'mixtral-groq': 'groq',
+ 'ernie': 'baidu',
+ 'hunyuan': 'tencent',
+ 'spark': 'iflytek',
+ 'doubao': 'bytedance',
+ 'minimax': 'minimax',
+ 'abab': 'minimax',
+ 'sensenova': 'sensenova',
+ 'nova': 'sensenova',
+ 'intern': 'internlm',
+ 'internlm': 'internlm',
+ 'falcon': 'tii',
+ 'jamba': 'ai21',
+ 'phi': 'microsoft',
+ 'azure': 'azure',
+ 'palm': 'google',
+ 'bard': 'google',
+ 'codex': 'openai',
+ 'luminous': 'alephalpha',
+ 'alpaca': 'together',
+ 'vicuna': 'together',
+ 'wizardlm': 'together',
+ 'openchat': 'together',
+ 'orca': 'microsoft',
+ 'dolly': 'databricks',
+ 'starcoder': 'huggingface',
+ 'codegen': 'salesforce',
+ 'bloom': 'bigscience',
+ 'opt': 'meta',
+ 'galactica': 'meta',
+ 'llama2': 'meta',
+ 'llama3': 'meta',
+ 'llama4': 'meta',
+ 'codellama': 'meta',
+ 'grok': 'xai',
+ 'poe': 'poe',
+ 'or': 'openrouter',
+ 'openrouter': 'openrouter',
+ 'fireworks': 'fireworks',
+ 'eh/': 'electronhub',
+ 'electronhub/': 'electronhub',
+ 'electron/': 'electronhub',
+ 'deepl': 'deepl',
+ 'google-translate': 'google_translate',
+ }
+
+ # Model-specific constraints
+ MODEL_CONSTRAINTS = {
+ 'temperature_fixed': ['o4-mini', 'o1-mini', 'o1-preview', 'o3-mini', 'o3', 'o3-pro', 'o4-mini', 'gpt-5-mini','gpt-5','gpt-5-nano'],
+ 'no_system_message': ['o1', 'o1-preview', 'o3', 'o3-pro'],
+ 'max_completion_tokens': ['o4', 'o1', 'o3', 'gpt-5-mini','gpt-5','gpt-5-nano'],
+ 'chinese_optimized': ['qwen', 'yi', 'glm', 'chatglm', 'baichuan', 'ernie', 'hunyuan'],
+ }
+
+ @classmethod
+ def _log_once(cls, message: str, is_debug: bool = False):
+ """Log a message only once per session to avoid spam"""
+ with cls._message_lock:
+ if message not in cls._displayed_messages:
+ cls._displayed_messages.add(message)
+ if is_debug:
+ print(f"[DEBUG] {message}")
+ else:
+ logger.info(message)
+ return True
+ return False
+
+ @classmethod
+ def setup_multi_key_pool(cls, keys_list, force_rotation=True, rotation_frequency=1):
+ """Setup the shared API key pool"""
+ with cls._pool_lock:
+ if cls._api_key_pool is None:
+ cls._api_key_pool = APIKeyPool()
+
+ # Initialize rate limit cache if needed
+ if cls._rate_limit_cache is None:
+ cls._rate_limit_cache = RateLimitCache()
+
+ # Validate and fix encrypted keys
+ validated_keys = []
+ encrypted_keys_fixed = 0
+
+ # FIX 1: Use keys_list parameter instead of undefined 'config'
+ for i, key_data in enumerate(keys_list):
+ if not isinstance(key_data, dict):
+ continue
+
+ api_key = key_data.get('api_key', '')
+ if not api_key:
+ continue
+
+ # Fix encrypted keys
+ if api_key.startswith('ENC:'):
+ try:
+ from api_key_encryption import get_handler
+ handler = get_handler()
+ decrypted_key = handler.decrypt_value(api_key)
+
+ if decrypted_key != api_key and not decrypted_key.startswith('ENC:'):
+ # Create a copy with decrypted key
+ fixed_key_data = key_data.copy()
+ fixed_key_data['api_key'] = decrypted_key
+ validated_keys.append(fixed_key_data)
+ encrypted_keys_fixed += 1
+ except Exception:
+ continue
+ else:
+ # Key is already decrypted
+ validated_keys.append(key_data)
+
+ if not validated_keys:
+ return False
+
+ # Load the validated keys
+ cls._api_key_pool.load_from_list(validated_keys)
+ #cls._main_fallback_key = validated_keys[0]['api_key']
+ #cls._main_fallback_model = validated_keys[0]['model']
+ #print(f"🔑 Using {validated_keys[0]['model']} as main fallback key")
+
+ # FIX 2: Store settings at class level (these affect all instances)
+ # These are class variables since pool is shared
+ if not hasattr(cls, '_force_rotation'):
+ cls._force_rotation = force_rotation
+ if not hasattr(cls, '_rotation_frequency'):
+ cls._rotation_frequency = rotation_frequency
+
+ # Or update if provided
+ cls._force_rotation = force_rotation
+ cls._rotation_frequency = rotation_frequency
+
+ # Single debug message
+ if encrypted_keys_fixed > 0:
+ print(f"🔑 Multi-key pool: {len(validated_keys)} keys loaded ({encrypted_keys_fixed} required decryption fix)")
+ else:
+ print(f"🔑 Multi-key pool: {len(validated_keys)} keys loaded")
+
+ return True
+
+ @classmethod
+ def initialize_key_pool(cls, key_list: list):
+ """Initialize the shared API key pool (legacy compatibility)"""
+ with cls._pool_lock:
+ if cls._api_key_pool is None:
+ cls._api_key_pool = APIKeyPool()
+ cls._api_key_pool.load_from_list(key_list)
+
+ @classmethod
+ def get_key_pool(cls):
+ """Get the shared API key pool (legacy compatibility)"""
+ with cls._pool_lock:
+ if cls._api_key_pool is None:
+ cls._api_key_pool = APIKeyPool()
+ return cls._api_key_pool
+
+ def _get_max_retries(self) -> int:
+ """Get max retry count from environment variable, default to 7"""
+ return int(os.getenv('MAX_RETRIES', '7'))
+
+ # Class-level cancellation flag for all instances
+ _global_cancelled = False
+ _global_cancel_lock = threading.RLock()
+
+ @classmethod
+ def set_global_cancellation(cls, cancelled: bool):
+ """Set global cancellation flag for all client instances"""
+ with cls._global_cancel_lock:
+ cls._global_cancelled = cancelled
+
+ @classmethod
+ def is_globally_cancelled(cls) -> bool:
+ """Check if globally cancelled"""
+ with cls._global_cancel_lock:
+ return cls._global_cancelled
+
+ def __init__(self, api_key: str, model: str, output_dir: str = "Output"):
+ """Initialize the unified client with enhanced thread safety"""
+ # Store original values
+ self.original_api_key = api_key
+ self.original_model = model
+
+ self._sequential_send_lock = threading.Lock()
+
+ # Thread submission timing controls
+ self._thread_submission_lock = threading.Lock()
+ self._last_thread_submission_time = 0
+ self._thread_submission_count = 0
+
+ # Add unique session ID for this client instance
+ self.session_id = str(uuid.uuid4())[:8]
+
+ # INSTANCE-LEVEL multi-key configuration
+ self._multi_key_mode = False # INSTANCE variable, not class!
+ self._force_rotation = True
+ self._rotation_frequency = 1
+
+ # Instance variables
+ self.output_dir = output_dir
+ self._cancelled = False
+ self._in_cleanup = False
+ self.conversation_message_count = 0
+ self.context = None
+ self.current_session_context = None
+
+ # Request tracking (from first init)
+ self._request_count = 0
+ self._thread_request_count = 0
+
+ # Thread coordination for key assignment
+ self._key_assignment_lock = RLock()
+ self._thread_key_assignments = {} # {thread_id: (key_index, timestamp)}
+ self._instance_model_lock = threading.RLock()
+
+ # Thread-local storage for client instances
+ self._thread_local = threading.local()
+
+ # Thread-specific request counters
+ self._thread_request_counters = defaultdict(int)
+ self._counter_lock = RLock()
+
+ # File write coordination
+ self._file_write_locks = {} # {filepath: RLock}
+ self._file_write_locks_lock = RLock()
+ if not hasattr(self, '_instance_model_lock'):
+ self._instance_model_lock = threading.RLock()
+
+ # Stats tracking
+ self.stats = {
+ 'total_requests': 0,
+ 'successful_requests': 0,
+ 'failed_requests': 0,
+ 'errors': defaultdict(int),
+ 'response_times': [],
+ 'empty_results': 0 # Add this for completeness
+ }
+
+ # Pattern recognition attributes
+ self.pattern_counts = {} # Track pattern frequencies for reinforcement
+ self.last_pattern = None # Track last seen pattern
+
+ # Call reset_stats if it exists (from first init)
+ if hasattr(self, 'reset_stats'):
+ self.reset_stats()
+
+ # File tracking for duplicate prevention
+ self._active_files = set() # Track files being written
+ self._file_lock = RLock()
+
+ # Timeout configuration
+ enabled, window = self._get_timeout_config()
+ self.request_timeout = int(os.getenv("CHUNK_TIMEOUT", "900")) if enabled else 36000 # 10 hours
+
+ # Initialize client references
+ self.api_key = api_key
+ self.model = model
+ self.key_identifier = "Single Key"
+ self.current_key_index = None
+ self.openai_client = None
+ self.gemini_client = None
+ self.mistral_client = None
+ self.cohere_client = None
+ self._actual_output_filename = None
+ self._current_output_file = None
+ self._last_response_filename = None
+
+
+ # Store Google Cloud credentials path if available
+ self.google_creds_path = None
+ # Store current key's Google credentials, Azure endpoint, and Google region
+ self.current_key_google_creds = None
+ self.current_key_azure_endpoint = None
+ self.current_key_google_region = None
+
+ # Azure-specific flags
+ self.is_azure = False
+ self.azure_endpoint = None
+ self.azure_api_version = None
+
+ self.translator_config = {
+ 'use_fallback_keys': os.getenv('USE_FALLBACK_KEYS', '0') == '1',
+ 'fallback_keys': json.loads(os.getenv('FALLBACK_KEYS', '[]'))
+ }
+
+ # Debug print to verify
+ if self.translator_config['use_fallback_keys']:
+ num_fallbacks = len(self.translator_config['fallback_keys'])
+ print(f"🔑 Fallback keys loaded: {num_fallbacks} keys")
+
+ # Check if multi-key mode should be enabled FOR THIS INSTANCE
+ use_multi_keys_env = os.getenv('USE_MULTI_API_KEYS', '0') == '1'
+ print(f"[DEBUG] USE_MULTI_API_KEYS env var: {os.getenv('USE_MULTI_API_KEYS')}")
+ print(f"[DEBUG] Creating new instance - multi-key mode from env: {use_multi_keys_env}")
+
+ if use_multi_keys_env:
+ # Initialize from environment
+ multi_keys_json = os.getenv('MULTI_API_KEYS', '[]')
+ print(f"[DEBUG] Loading multi-keys config...")
+ force_rotation = os.getenv('FORCE_KEY_ROTATION', '1') == '1'
+ rotation_frequency = int(os.getenv('ROTATION_FREQUENCY', '1'))
+
+ try:
+ multi_keys = json.loads(multi_keys_json)
+ if multi_keys:
+ # Setup the shared pool
+ self.setup_multi_key_pool(multi_keys, force_rotation, rotation_frequency)
+
+ # Enable multi-key mode FOR THIS INSTANCE
+ self._multi_key_mode = True
+ self._force_rotation = force_rotation
+ self._rotation_frequency = rotation_frequency
+
+ print(f"[DEBUG] ✅ This instance has multi-key mode ENABLED")
+ else:
+ print(f"[DEBUG] ❌ No keys found in config, staying in single-key mode")
+ self._multi_key_mode = False
+ except Exception as e:
+ print(f"Failed to load multi-key config: {e}")
+ self._multi_key_mode = False
+ print(f"[DEBUG] ❌ Error loading config, falling back to single-key mode")
+ else:
+ #print(f"[DEBUG] ❌ Multi-key mode is DISABLED for this instance (env var = 0)")
+ self._multi_key_mode = False
+
+ # Initial setup based on THIS INSTANCE's mode
+ if not self._multi_key_mode:
+ self.api_key = api_key
+ self.model = model
+ self.key_identifier = "Single Key"
+ self._setup_client()
+
+ # Check for Vertex AI Model Garden models (contain @ symbol)
+ # NOTE: This happens AFTER the initial setup, as in the second version
+ if '@' in self.model or self.model.startswith('vertex/'):
+ # For Vertex AI, we need Google Cloud credentials, not API key
+ self.client_type = 'vertex_model_garden'
+
+ # Try to find Google Cloud credentials
+ # 1. Check environment variable
+ self.google_creds_path = os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
+
+ # 2. Check if passed as api_key (for compatibility)
+ if not self.google_creds_path and api_key and os.path.exists(api_key):
+ self.google_creds_path = api_key
+ # Use logger if available, otherwise print
+ if hasattr(self, 'logger'):
+ self.logger.info("Using API key parameter as Google Cloud credentials path")
+ else:
+ print("Using API key parameter as Google Cloud credentials path")
+
+ # 3. Will check GUI config later during send if needed
+
+ if self.google_creds_path:
+ msg = f"Vertex AI Model Garden: Using credentials from {self.google_creds_path}"
+ if hasattr(self, 'logger'):
+ self.logger.info(msg)
+ else:
+ print(msg)
+ else:
+ print("Vertex AI Model Garden: Google Cloud credentials not yet configured")
+ else:
+ # Only set up client if not in multi-key mode
+ # Multi-key mode will set up the client when a key is selected
+ if not self._multi_key_mode:
+ # NOTE: This is a SECOND call to _setup_client() in the else branch
+ # Determine client type from model name
+ self._setup_client()
+ print(f"[DEBUG] After setup - client_type: {getattr(self, 'client_type', None)}, openai_client: {self.openai_client}")
+
+ # FORCE OPENAI CLIENT IF CUSTOM BASE URL IS SET AND ENABLED
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+
+ # Force OpenAI client when custom endpoint is enabled
+ if custom_base_url and use_custom_endpoint and self.openai_client is None:
+ original_client_type = self.client_type
+ print(f"[DEBUG] Custom base URL detected and enabled, overriding {original_client_type or 'unmatched'} model to use OpenAI client: {self.model}")
+ self.client_type = 'openai'
+
+ # Check if openai module is available
+ try:
+ import openai
+ except ImportError:
+ raise ImportError("OpenAI library not installed. Install with: pip install openai")
+
+ # Validate URL has protocol
+ if not custom_base_url.startswith(('http://', 'https://')):
+ print(f"[WARNING] Custom base URL missing protocol, adding https://")
+ custom_base_url = 'https://' + custom_base_url
+
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=custom_base_url
+ )
+ print(f"[DEBUG] OpenAI client created with custom base URL: {custom_base_url}")
+ elif custom_base_url and not use_custom_endpoint:
+ print(f"[DEBUG] Custom base URL detected but disabled via toggle, using standard client")
+
+ def _apply_thread_submission_delay(self):
+ # Get threading delay from environment (default 0.5)
+ thread_delay = float(os.getenv("THREAD_SUBMISSION_DELAY_SECONDS", "0.5"))
+
+ if thread_delay <= 0:
+ return
+
+ sleep_time = 0
+ should_log = False
+ log_message = ""
+
+ # HOLD LOCK ONLY BRIEFLY to check timing and update counter
+ with self._thread_submission_lock:
+ current_time = time.time()
+ time_since_last_submission = current_time - self._last_thread_submission_time
+
+ if time_since_last_submission < thread_delay:
+ sleep_time = thread_delay - time_since_last_submission
+ # Update the timestamp NOW while we have the lock
+ self._last_thread_submission_time = time.time()
+
+ # Determine if we should log (but don't log yet)
+ if self._thread_submission_count < 3:
+ should_log = True
+ log_message = f"🧵 [{threading.current_thread().name}] Thread delay: {sleep_time:.1f}s"
+ elif self._thread_submission_count == 3:
+ should_log = True
+ log_message = f"🧵 [Subsequent thread delays: {thread_delay}s each...]"
+
+ self._thread_submission_count += 1
+ # LOCK RELEASED HERE
+
+ # NOW do the sleep OUTSIDE the lock
+ if sleep_time > 0:
+ if should_log:
+ print(log_message)
+
+ # Interruptible sleep
+ elapsed = 0
+ check_interval = 0.1
+ while elapsed < sleep_time:
+ if self._cancelled:
+ print(f"🛑 Threading delay cancelled")
+ return # Exit early if cancelled
+
+ time.sleep(min(check_interval, sleep_time - elapsed))
+ elapsed += check_interval
+
+ def _get_thread_local_client(self):
+ """Get or create thread-local client"""
+ thread_id = threading.current_thread().ident
+
+ # Check if we need a new client for this thread
+ if not hasattr(self._thread_local, 'initialized'):
+ self._thread_local.initialized = False
+ self._thread_local.api_key = None
+ self._thread_local.model = None
+ self._thread_local.key_index = None
+ self._thread_local.key_identifier = None
+ self._thread_local.request_count = 0
+ self._thread_local.openai_client = None
+ self._thread_local.gemini_client = None
+ self._thread_local.mistral_client = None
+ self._thread_local.cohere_client = None
+ self._thread_local.client_type = None
+ self._thread_local.current_request_label = None
+
+ # THREAD-LOCAL CACHE
+ self._thread_local.request_cache = {} # Each thread gets its own cache!
+ self._thread_local.cache_hits = 0
+ self._thread_local.cache_misses = 0
+
+ return self._thread_local
+
+ def _ensure_thread_client(self):
+ """Ensure the current thread has a properly initialized client with thread safety"""
+ # Check if cancelled before proceeding
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+
+ tls = self._get_thread_local_client()
+ thread_name = threading.current_thread().name
+ thread_id = threading.current_thread().ident
+
+ # Multi-key mode
+ if self._multi_key_mode:
+ # Check if we need to rotate
+ should_rotate = False
+
+ if not tls.initialized:
+ should_rotate = True
+ print(f"[Thread-{thread_name}] Initializing with multi-key mode")
+ elif self._force_rotation:
+ tls.request_count = getattr(tls, 'request_count', 0) + 1
+ if tls.request_count >= self._rotation_frequency:
+ should_rotate = True
+ tls.request_count = 0
+ print(f"[Thread-{thread_name}] Rotating key (reached {self._rotation_frequency} requests)")
+
+ if should_rotate:
+ # Release previous thread assignment to avoid stale usage tracking
+ if hasattr(self._api_key_pool, 'release_thread_assignment'):
+ try:
+ self._api_key_pool.release_thread_assignment(thread_id)
+ except Exception:
+ pass
+
+ # Get a key using thread-safe method with timeout
+ key_info = None
+
+ # Add timeout protection for key retrieval
+ start_time = time.time()
+ max_wait = 120 # 120 seconds max to get a key
+
+ # First try using the pool's method if available
+ if hasattr(self._api_key_pool, 'get_key_for_thread'):
+ try:
+ key_info = self._api_key_pool.get_key_for_thread(
+ force_rotation=should_rotate,
+ rotation_frequency=self._rotation_frequency
+ )
+ if key_info:
+ key, key_index, key_id = key_info
+ # Convert to tuple format expected below
+ key_info = (key, key_index)
+ except Exception as e:
+ print(f"[Thread-{thread_name}] Error getting key from pool: {e}")
+ key_info = None
+
+ # Fallback to our method with timeout check
+ if not key_info:
+ if time.time() - start_time > max_wait:
+ raise UnifiedClientError(f"Timeout getting key for thread after {max_wait}s", error_type="timeout")
+ key_info = self._get_next_available_key_for_thread()
+
+ if key_info:
+ key, key_index = key_info[:2] # Handle both tuple formats
+
+ # Generate key identifier
+ key_id = f"Key#{key_index+1} ({key.model})"
+ if hasattr(key, 'identifier') and key.identifier:
+ key_id = key.identifier
+
+ # Update thread-local state (no lock needed, thread-local is safe)
+ tls.api_key = key.api_key
+ tls.model = key.model
+ tls.key_index = key_index
+ tls.key_identifier = key_id
+ tls.google_credentials = getattr(key, 'google_credentials', None)
+ tls.azure_endpoint = getattr(key, 'azure_endpoint', None)
+ tls.azure_api_version = getattr(key, 'azure_api_version', None)
+ tls.google_region = getattr(key, 'google_region', None)
+ tls.use_individual_endpoint = getattr(key, 'use_individual_endpoint', False)
+ tls.initialized = True
+ tls.last_rotation = time.time()
+
+ # MICROSECOND LOCK: Only when copying to instance variables
+ with self._model_lock:
+ # Copy to instance for compatibility
+ self.api_key = tls.api_key
+ self.model = tls.model
+ self.key_identifier = tls.key_identifier
+ self.current_key_index = key_index
+ self.current_key_google_creds = tls.google_credentials
+ self.current_key_azure_endpoint = tls.azure_endpoint
+ self.current_key_azure_api_version = tls.azure_api_version
+ self.current_key_google_region = tls.google_region
+ self.current_key_use_individual_endpoint = tls.use_individual_endpoint
+
+ # Log key assignment - FIX: Add None check for api_key
+ if self.api_key and len(self.api_key) > 12:
+ masked_key = self.api_key[:4] + "..." + self.api_key[-4:]
+ elif self.api_key and len(self.api_key) > 5:
+ masked_key = self.api_key[:3] + "..." + self.api_key[-2:]
+ else:
+ masked_key = "***"
+
+ print(f"[Thread-{thread_name}] 🔑 Using {self.key_identifier} - {masked_key}")
+
+ # Setup client with new key (might need lock if it modifies instance state)
+ self._setup_client()
+
+ # CRITICAL FIX: Apply individual key's Azure endpoint like single-key mode does
+ self._apply_individual_key_endpoint_if_needed()
+ return
+ else:
+ # No keys available
+ raise UnifiedClientError("No available API keys for thread", error_type="no_keys")
+ else:
+ # Not rotating, ensure instance variables match thread-local
+ if tls.initialized:
+ # MICROSECOND LOCK: When syncing instance variables
+ with self._model_lock:
+ self.api_key = tls.api_key
+ self.model = tls.model
+ self.key_identifier = tls.key_identifier
+ self.current_key_index = getattr(tls, 'key_index', None)
+ self.current_key_google_creds = getattr(tls, 'google_credentials', None)
+ self.current_key_azure_endpoint = getattr(tls, 'azure_endpoint', None)
+ self.current_key_azure_api_version = getattr(tls, 'azure_api_version', None)
+ self.current_key_google_region = getattr(tls, 'google_region', None)
+ self.current_key_use_individual_endpoint = getattr(tls, 'use_individual_endpoint', False)
+
+ # Single key mode
+ elif not tls.initialized:
+ tls.api_key = self.original_api_key
+ tls.model = self.original_model
+ tls.key_identifier = "Single Key"
+ tls.initialized = True
+ tls.request_count = 0
+
+ # MICROSECOND LOCK: When setting instance variables
+ with self._model_lock:
+ self.api_key = tls.api_key
+ self.model = tls.model
+ self.key_identifier = tls.key_identifier
+
+ logger.debug(f"[Thread-{thread_name}] Single-key mode: Using {self.model}")
+ self._setup_client()
+
+ def _get_thread_key(self) -> Optional[Tuple[str, int]]:
+ """Get the API key assigned to current thread"""
+ thread_id = threading.current_thread().ident
+
+ with self._assignment_lock:
+ if thread_id in self._key_assignments:
+ return self._key_assignments[thread_id]
+
+ return None
+
+ def _assign_thread_key(self):
+ """Assign a key to the current thread"""
+ thread_id = threading.current_thread().ident
+ thread_name = threading.current_thread().name
+
+ # Check if cancelled at start
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+
+ # Check if thread already has a key
+ existing = self._get_thread_key()
+ if existing and not self._should_rotate_thread_key():
+ # Thread already has a key and doesn't need rotation
+ key_index, key_identifier = existing
+ self.current_key_index = key_index
+ self.key_identifier = key_identifier
+
+ # Apply the key settings
+ if key_index < len(self._api_key_pool.keys):
+ key = self._api_key_pool.keys[key_index]
+ self.api_key = key.api_key
+ self.model = key.model
+ return
+
+ # Get next available key for this thread
+ max_retries = self._get_max_retries()
+ retry_count = 0
+
+ while retry_count <= max_retries:
+ with self._pool_lock:
+ key_info = self._get_next_available_key_for_thread()
+ if key_info:
+ key, key_index = key_info
+ self.api_key = key.api_key
+ self.model = key.model
+ self.current_key_index = key_index
+ self.key_identifier = f"Key#{key_index+1} ({self.model})"
+
+ # Store assignment
+ with self._assignment_lock:
+ self._key_assignments[thread_id] = (key_index, self.key_identifier)
+
+ # FIX: Add None check for api_key
+ if self.api_key and len(self.api_key) > 12:
+ masked_key = self.api_key[:8] + "..." + self.api_key[-4:]
+ else:
+ masked_key = self.api_key or "***"
+ print(f"[THREAD-{thread_name}] 🔑 Assigned {self.key_identifier} - {masked_key}")
+
+ # Setup client for this key
+ self._setup_client()
+ self._apply_custom_endpoint_if_needed()
+ print(f"[THREAD-{thread_name}] 🔄 Key assignment: Client setup completed, ready for requests...")
+ time.sleep(0.1) # Brief pause after key assignment for stability
+ return
+
+ # No key available - all are on cooldown
+ if retry_count < max_retries:
+ wait_time = self._get_shortest_cooldown_time()
+ print(f"[THREAD-{thread_name}] No keys available, waiting {wait_time}s (retry {retry_count + 1}/{max_retries})")
+
+ # Wait with cancellation check
+ for i in range(wait_time):
+ if hasattr(self, '_cancelled') and self._cancelled:
+ raise UnifiedClientError("Operation cancelled while waiting for key", error_type="cancelled")
+ time.sleep(1)
+ if i % 10 == 0 and i > 0:
+ print(f"[THREAD-{thread_name}] Still waiting... {wait_time - i}s remaining")
+
+ # Clear expired entries before next attempt
+ if hasattr(self, '_rate_limit_cache') and self._rate_limit_cache:
+ self._rate_limit_cache.clear_expired()
+ print(f"[THREAD-{thread_name}] 🔄 Cooldown wait: Cache cleared, attempting next key assignment...")
+ time.sleep(0.1) # Brief pause after cooldown wait for retry stability
+
+ retry_count += 1
+
+ # If we've exhausted all retries, raise error
+ raise UnifiedClientError(f"No available API keys for thread after {max_retries} retries", error_type="no_keys")
+
+ def _get_next_available_key_for_thread(self) -> Optional[Tuple]:
+ """Get next available key for thread assignment with proper thread safety"""
+ if not self._api_key_pool:
+ return None
+
+ thread_name = threading.current_thread().name
+
+ # Stop check
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+
+ # Use the APIKeyPool's built-in thread-safe method
+ if hasattr(self._api_key_pool, 'get_key_for_thread'):
+ # Let the pool handle all the thread assignment logic
+ key_info = self._api_key_pool.get_key_for_thread(
+ force_rotation=getattr(self, '_force_rotation', True),
+ rotation_frequency=getattr(self, '_rotation_frequency', 1)
+ )
+
+ if key_info:
+ key, key_index, key_id = key_info
+ print(f"[{thread_name}] Got {key_id} from pool")
+ return (key, key_index)
+ else:
+ # Pool couldn't provide a key, all are on cooldown
+ print(f"[{thread_name}] No keys available from pool")
+ return None
+
+ # Fallback: If pool doesn't have the method, use simpler logic
+ print("APIKeyPool missing get_key_for_thread method, using fallback")
+
+ with self.__class__._pool_lock:
+ # Simple round-robin without complex thread tracking
+ for _ in range(len(self._api_key_pool.keys)):
+ current_idx = getattr(self._api_key_pool, 'current_index', 0)
+
+ # Ensure index is valid
+ if current_idx >= len(self._api_key_pool.keys):
+ current_idx = 0
+ self._api_key_pool.current_index = 0
+
+ key = self._api_key_pool.keys[current_idx]
+ key_id = f"Key#{current_idx+1} ({key.model})"
+
+ # Advance index for next call
+ self._api_key_pool.current_index = (current_idx + 1) % len(self._api_key_pool.keys)
+
+ # Check availability
+ if key.is_available() and not self._rate_limit_cache.is_rate_limited(key_id):
+ print(f"[{thread_name}] Assigned {key_id} (fallback)")
+ return (key, current_idx)
+
+ # No available keys
+ print(f"[{thread_name}] All keys unavailable in fallback")
+ return None
+
+ def _wait_for_available_key(self) -> Optional[Tuple]:
+ """Wait for a key to become available (called outside lock)"""
+ thread_name = threading.current_thread().name
+
+ # Check if cancelled first
+ if self._cancelled:
+ if not self._is_stop_requested():
+ logger.info(f"[Thread-{thread_name}] Operation cancelled, not waiting for key")
+ return None
+
+ # Get shortest cooldown time with timeout protection
+ wait_time = self._get_shortest_cooldown_time()
+
+ # Cap maximum wait time to prevent infinite waits
+ max_wait_time = 120 # 2 minutes max
+ if wait_time > max_wait_time:
+ print(f"[Thread-{thread_name}] Cooldown time {wait_time}s exceeds max {max_wait_time}s")
+ wait_time = max_wait_time
+
+ if wait_time <= 0:
+ # Keys should be available now
+ with self.__class__._pool_lock:
+ for i, key in enumerate(self._api_key_pool.keys):
+ key_id = f"Key#{i+1} ({key.model})"
+ if key.is_available() and not self._rate_limit_cache.is_rate_limited(key_id):
+ return (key, i)
+
+ print(f"[Thread-{thread_name}] All keys on cooldown. Waiting {wait_time}s...")
+
+ # Wait with cancellation check
+ wait_start = time.time()
+ while time.time() - wait_start < wait_time:
+ if self._cancelled:
+ print(f"[Thread-{thread_name}] Wait cancelled by user")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+ # Check every second if a key became available early
+ with self.__class__._pool_lock:
+ for i, key in enumerate(self._api_key_pool.keys):
+ key_id = f"Key#{i+1} ({key.model})"
+ if key.is_available() and not self._rate_limit_cache.is_rate_limited(key_id):
+ print(f"[Thread-{thread_name}] Key became available early: {key_id}")
+ print(f"[Thread-{thread_name}] 🔄 Early key availability: Key ready for immediate use...")
+ time.sleep(0.1) # Brief pause after early detection for stability
+ return (key, i)
+
+ time.sleep(1)
+
+ # Progress indicator
+ elapsed = int(time.time() - wait_start)
+ if elapsed % 10 == 0 and elapsed > 0:
+ remaining = wait_time - elapsed
+ print(f"[Thread-{thread_name}] Still waiting... {remaining}s remaining")
+
+ # Clear expired entries from cache
+ self._rate_limit_cache.clear_expired()
+
+ # Final attempt after wait
+ with self.__class__._pool_lock:
+ # Try to find an available key
+ for i, key in enumerate(self._api_key_pool.keys):
+ key_id = f"Key#{i+1} ({key.model})"
+ if key.is_available() and not self._rate_limit_cache.is_rate_limited(key_id):
+ return (key, i)
+
+ # Still no keys? Return the first enabled one (last resort)
+ for i, key in enumerate(self._api_key_pool.keys):
+ if key.enabled:
+ print(f"[Thread-{thread_name}] WARNING: Using potentially rate-limited key as last resort")
+ return (key, i)
+
+ return None
+
+ def _should_rotate_thread_key(self) -> bool:
+ """Check if current thread should rotate its key"""
+ if not self._force_rotation:
+ return False
+
+ # Check thread-local request count
+ if not hasattr(self._thread_local, 'request_count'):
+ self._thread_local.request_count = 0
+
+ self._thread_local.request_count += 1
+
+ if self._thread_local.request_count >= self._rotation_frequency:
+ self._thread_local.request_count = 0
+ return True
+
+ return False
+
+ def _handle_rate_limit_for_thread(self):
+ """Handle rate limit by marking current thread's key and getting a new one (thread-safe)"""
+ if not self._multi_key_mode: # Check INSTANCE variable
+ return
+
+ thread_id = threading.current_thread().ident
+ thread_name = threading.current_thread().name
+
+ # Get thread-local state first (thread-safe by nature)
+ tls = self._get_thread_local_client()
+
+ # Store the current key info before we change anything
+ current_key_index = None
+ current_key_identifier = None
+
+ # Safely get current key information from thread-local storage
+ if hasattr(tls, 'key_index') and tls.key_index is not None:
+ current_key_index = tls.key_index
+ current_key_identifier = getattr(tls, 'key_identifier', f"Key#{current_key_index+1}")
+ elif hasattr(self, 'current_key_index') and self.current_key_index is not None:
+ # Fallback to instance variable if thread-local not set
+ current_key_index = self.current_key_index
+ current_key_identifier = self.key_identifier
+
+ # Mark the current key as rate limited (if we have one)
+ if current_key_index is not None and self._api_key_pool:
+ # Use the pool's thread-safe method to mark the error
+ self._api_key_pool.mark_key_error(current_key_index, 429)
+
+ # Get cooldown value safely
+ cooldown = 60 # Default
+ with self.__class__._pool_lock:
+ if current_key_index < len(self._api_key_pool.keys):
+ key = self._api_key_pool.keys[current_key_index]
+ cooldown = getattr(key, 'cooldown', 60)
+
+ print(f"[THREAD-{thread_name}] 🕐 Marking {current_key_identifier} for cooldown ({cooldown}s)")
+
+ # Add to rate limit cache (this is already thread-safe)
+ if hasattr(self.__class__, '_rate_limit_cache') and self.__class__._rate_limit_cache:
+ self.__class__._rate_limit_cache.add_rate_limit(current_key_identifier, cooldown)
+
+ # Clear thread-local state to force new key assignment
+ tls.initialized = False
+ tls.api_key = None
+ tls.model = None
+ tls.key_index = None
+ tls.key_identifier = None
+ tls.request_count = 0
+
+ # Remove any legacy assignments (thread-safe with lock)
+ if hasattr(self, '_assignment_lock') and hasattr(self, '_key_assignments'):
+ with self._assignment_lock:
+ if thread_id in self._key_assignments:
+ del self._key_assignments[thread_id]
+
+ # Release thread assignment in the pool (if pool supports it)
+ if hasattr(self._api_key_pool, 'release_thread_assignment'):
+ self._api_key_pool.release_thread_assignment(thread_id)
+
+ # Now force getting a new key
+ # This will call _ensure_thread_client which will get a new key
+ print(f"[THREAD-{thread_name}] 🔄 Requesting new key after rate limit...")
+
+ try:
+ # Ensure we get a new client with a new key
+ self._ensure_thread_client()
+
+ # Verify we got a different key
+ new_key_index = getattr(tls, 'key_index', None)
+ new_key_identifier = getattr(tls, 'key_identifier', 'Unknown')
+
+ if new_key_index != current_key_index:
+ print(f"[THREAD-{thread_name}] ✅ Successfully rotated from {current_key_identifier} to {new_key_identifier}")
+ else:
+ print(f"[THREAD-{thread_name}] ⚠️ Warning: Got same key back: {new_key_identifier}")
+
+ except Exception as e:
+ print(f"[THREAD-{thread_name}] ❌ Failed to get new key after rate limit: {e}")
+ raise UnifiedClientError(f"Failed to rotate key after rate limit: {e}", error_type="no_keys")
+
+ # Helper methods that need to check instance state
+ def _count_available_keys(self) -> int:
+ """Count how many keys are currently available"""
+ if not self._multi_key_mode or not self.__class__._api_key_pool:
+ return 0
+
+ count = 0
+ for i, key in enumerate(self.__class__._api_key_pool.keys):
+ if key.enabled:
+ key_id = f"Key#{i+1} ({key.model})"
+ # Check both rate limit cache AND key's own cooling status
+ is_rate_limited = self.__class__._rate_limit_cache.is_rate_limited(key_id)
+ is_cooling = key.is_cooling_down # Also check the key's own status
+
+ if not is_rate_limited and not is_cooling:
+ count += 1
+ return count
+
+ def _mark_key_success(self):
+ """Mark the current key as successful (thread-safe)"""
+ # Check both instance and class-level cancellation
+ if (hasattr(self, '_cancelled') and self._cancelled) or self.__class__._global_cancelled:
+ # Don't mark success if we're cancelled
+ return
+
+ if not self._multi_key_mode:
+ return
+
+ # Get thread-local state
+ tls = self._get_thread_local_client()
+ key_index = getattr(tls, 'key_index', None)
+
+ # Fallback to instance variable if thread-local not set
+ if key_index is None:
+ key_index = getattr(self, 'current_key_index', None)
+
+ if key_index is not None and self.__class__._api_key_pool:
+ # Use the pool's thread-safe method
+ self.__class__._api_key_pool.mark_key_success(key_index)
+
+ def _mark_key_error(self, error_code: int = None):
+ """Mark current key as having an error and apply cooldown if rate limited (thread-safe)"""
+ # Check both instance and class-level cancellation
+ if (hasattr(self, '_cancelled') and self._cancelled) or self.__class__._global_cancelled:
+ # Don't mark error if we're cancelled
+ return
+
+ if not self._multi_key_mode:
+ return
+
+ # Get thread-local state
+ tls = self._get_thread_local_client()
+ key_index = getattr(tls, 'key_index', None)
+
+ # Fallback to instance variable if thread-local not set
+ if key_index is None:
+ key_index = getattr(self, 'current_key_index', None)
+
+ if key_index is not None and self.__class__._api_key_pool:
+ # Use the pool's thread-safe method
+ self.__class__._api_key_pool.mark_key_error(key_index, error_code)
+
+ # If it's a rate limit error, also add to rate limit cache
+ if error_code == 429:
+ # Get key identifier safely
+ with self.__class__._pool_lock:
+ if key_index < len(self.__class__._api_key_pool.keys):
+ key = self.__class__._api_key_pool.keys[key_index]
+ key_id = f"Key#{key_index+1} ({key.model})"
+ cooldown = getattr(key, 'cooldown', 60)
+
+ # Add to rate limit cache (already thread-safe)
+ if hasattr(self.__class__, '_rate_limit_cache'):
+ self.__class__._rate_limit_cache.add_rate_limit(key_id, cooldown)
+
+ def _apply_custom_endpoint_if_needed(self):
+ """Apply custom endpoint configuration if needed"""
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+
+ if custom_base_url and use_custom_endpoint:
+ if not custom_base_url.startswith(('http://', 'https://')):
+ custom_base_url = 'https://' + custom_base_url
+
+ # Don't override Gemini models - they have their own separate endpoint toggle
+ if self.client_type == 'gemini':
+ # Only log if Gemini OpenAI endpoint is not also enabled
+ use_gemini_endpoint = os.getenv("USE_GEMINI_OPENAI_ENDPOINT", "0") == "1"
+ if not use_gemini_endpoint:
+ self._log_once("Gemini model detected, not overriding with custom OpenAI endpoint (use USE_GEMINI_OPENAI_ENDPOINT instead)", is_debug=True)
+ return
+
+ # Override other model types to use OpenAI client when custom endpoint is enabled
+ original_client_type = self.client_type
+ self.client_type = 'openai'
+
+ try:
+ import openai
+ # MICROSECOND LOCK: Create custom endpoint client with thread safety
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=custom_base_url
+ )
+ except ImportError:
+ print(f"[ERROR] OpenAI library not installed, cannot use custom endpoint")
+ self.client_type = original_client_type # Restore original type
+
+ def _apply_individual_key_endpoint_if_needed(self):
+ """Apply individual key endpoint if configured (multi-key mode) - works independently of global toggle"""
+ # Check if this key has an individual endpoint enabled AND configured
+ has_individual_endpoint = (hasattr(self, 'current_key_azure_endpoint') and
+ hasattr(self, 'current_key_use_individual_endpoint') and
+ self.current_key_use_individual_endpoint and
+ self.current_key_azure_endpoint)
+
+ if has_individual_endpoint:
+ # Use individual endpoint - works independently of global custom endpoint toggle
+ individual_endpoint = self.current_key_azure_endpoint
+
+ if not individual_endpoint.startswith(('http://', 'https://')):
+ individual_endpoint = 'https://' + individual_endpoint
+
+ # Don't override Gemini models - they have their own separate endpoint toggle
+ if self.client_type == 'gemini':
+ # Only log if Gemini OpenAI endpoint is not also enabled
+ use_gemini_endpoint = os.getenv("USE_GEMINI_OPENAI_ENDPOINT", "0") == "1"
+ if not use_gemini_endpoint:
+ self._log_once("Gemini model detected, not overriding with individual endpoint (use USE_GEMINI_OPENAI_ENDPOINT instead)", is_debug=True)
+ return
+
+ # Detect Azure endpoints and route via Azure handler instead of generic OpenAI base_url
+ url_l = individual_endpoint.lower()
+ is_azure = (".openai.azure.com" in url_l) or (".cognitiveservices" in url_l) or ("/openai/deployments/" in url_l)
+ if is_azure:
+ # Normalize to plain Azure base (strip any trailing /openai/... if present)
+ azure_base = individual_endpoint.split('/openai')[0] if '/openai' in individual_endpoint else individual_endpoint.rstrip('/')
+ with self._model_lock:
+ # Switch this instance to Azure mode for correct routing
+ self.client_type = 'azure'
+ self.azure_endpoint = azure_base
+ # Prefer per-key Azure API version if available
+ self.azure_api_version = getattr(self, 'current_key_azure_api_version', None) or os.getenv('AZURE_API_VERSION', '2024-02-01')
+ # Mark that we applied an individual (per-key) endpoint
+ self._individual_endpoint_applied = True
+ # Also update TLS so subsequent calls on this thread know it's Azure
+ try:
+ tls = self._get_thread_local_client()
+ with self._model_lock:
+ tls.azure_endpoint = azure_base
+ tls.azure_api_version = self.azure_api_version
+ tls.client_type = 'azure'
+ except Exception:
+ pass
+ print(f"[DEBUG] Individual Azure endpoint applied: {azure_base} (api-version={self.azure_api_version})")
+ return # Handled; do not fall through to custom endpoint logic
+
+ # Non-Azure: Override to use OpenAI-compatible client against the provided base URL
+ original_client_type = self.client_type
+ self.client_type = 'openai'
+
+ try:
+ import openai
+
+ # MICROSECOND LOCK: Create individual endpoint client with thread safety
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=individual_endpoint
+ )
+
+ # Set flag to prevent _setup_client from overriding this client
+ self._individual_endpoint_applied = True
+
+ # CRITICAL: Update thread-local storage with our correct client
+ tls = self._get_thread_local_client()
+ with self._model_lock:
+ tls.openai_client = self.openai_client
+ tls.client_type = 'openai'
+
+ return # Individual endpoint applied - don't check global custom endpoint
+ except ImportError:
+ self.client_type = original_client_type # Restore original type
+ return
+ except Exception as e:
+ print(f"[ERROR] Failed to create individual endpoint client: {e}")
+ self.client_type = original_client_type # Restore original type
+ return
+
+ # If no individual endpoint, check global custom endpoint (but only if global toggle is enabled)
+ self._apply_custom_endpoint_if_needed()
+
+ # Properties for backward compatibility
+ @property
+ def use_multi_keys(self):
+ """Property for backward compatibility"""
+ return self._multi_key_mode
+
+ @use_multi_keys.setter
+ def use_multi_keys(self, value):
+ """Property setter for backward compatibility"""
+ self._multi_key_mode = value
+
+ def _ensure_key_rotation(self):
+ """Ensure we have a key selected and rotate if in multi-key mode"""
+ if not self.use_multi_keys:
+ return
+
+ # Force rotation to next key on every request
+ if self.current_key_index is not None:
+ # We already have a key, rotate to next
+ print(f"[DEBUG] Rotating from {self.key_identifier} to next key")
+ self._force_next_key()
+ else:
+ # First request, get initial key
+ print(f"[DEBUG] First request, selecting initial key")
+ key_info = self._get_next_available_key()
+ if key_info:
+ self._apply_key_change(key_info, "Initial")
+ else:
+ raise UnifiedClientError("No available API keys", error_type="no_keys")
+
+ def _force_next_key(self):
+ """Force rotation to the next key in the pool"""
+ if not self.use_multi_keys or not self._api_key_pool:
+ return
+
+ old_key_identifier = self.key_identifier
+
+ # Use force_rotate method to always get next key
+ key_info = self._api_key_pool.force_rotate_to_next_key()
+ if key_info:
+ # Check if it's available
+ if not key_info[0].is_available():
+ print(f"[WARNING] Next key in rotation is on cooldown, but using it anyway")
+
+ self._apply_key_change(key_info, old_key_identifier)
+ print(f"🔄 Force key rotation: Key change completed, system ready...")
+ time.sleep(0.5) # Brief pause after force rotation for system stability
+ else:
+ print(f"[ERROR] Failed to rotate to next key")
+
+ def _rotate_to_next_key(self) -> bool:
+ """Rotate to the next available key and reinitialize client - THREAD SAFE"""
+ if not self.use_multi_keys or not self._api_key_pool:
+ return False
+
+ old_key_identifier = self.key_identifier
+
+ key_info = self._get_next_available_key()
+ if key_info:
+ # MICROSECOND LOCK: Protect all instance variable modifications
+ with self._model_lock:
+ # Update key and model
+ self.api_key = key_info[0].api_key
+ self.model = key_info[0].model
+ self.current_key_index = key_info[1]
+
+ # Update key identifier
+ self.key_identifier = f"Key#{key_info[1]+1} ({self.model})"
+
+ # Reset clients (these are instance variables too!)
+ self.openai_client = None
+ self.gemini_client = None
+ self.mistral_client = None
+ self.cohere_client = None
+
+ # Logging (outside lock - just reading) - FIX: Add None check
+ if self.api_key and len(self.api_key) > 12:
+ masked_key = self.api_key[:8] + "..." + self.api_key[-4:]
+ else:
+ masked_key = self.api_key or "***"
+ print(f"[DEBUG] 🔄 Rotating from {old_key_identifier} to {self.key_identifier} - {masked_key}")
+
+ # Re-setup the client with new key
+ self._setup_client()
+
+ # Re-apply individual endpoint if needed (this takes priority over global custom endpoint)
+ self._apply_individual_key_endpoint_if_needed()
+
+ print(f"🔄 Key rotation: Endpoint setup completed, rotation successful...")
+ time.sleep(0.5) # Brief pause after rotation for system stability
+ return True
+
+ print(f"[WARNING] No available keys to rotate to")
+ return False
+
+ def get_stats(self) -> Dict[str, any]:
+ """Get statistics about API usage"""
+ stats = dict(self.stats)
+
+ # Add multi-key stats if in multi-key mode
+ if self._multi_key_mode: # Use instance variable
+ stats['multi_key_enabled'] = True
+ stats['force_rotation'] = self._force_rotation # Use instance variable
+ stats['rotation_frequency'] = self._rotation_frequency # Use instance variable
+
+ if hasattr(self, '_api_key_pool') and self._api_key_pool:
+ stats['total_keys'] = len(self._api_key_pool.keys)
+ stats['active_keys'] = sum(1 for k in self._api_key_pool.keys if k.enabled and k.is_available())
+ stats['keys_on_cooldown'] = sum(1 for k in self._api_key_pool.keys if k.is_cooling_down)
+
+ # Per-key stats
+ key_stats = []
+ for i, key in enumerate(self._api_key_pool.keys):
+ key_stat = {
+ 'index': i,
+ 'model': key.model,
+ 'enabled': key.enabled,
+ 'available': key.is_available(),
+ 'success_count': key.success_count,
+ 'error_count': key.error_count,
+ 'cooling_down': key.is_cooling_down
+ }
+ key_stats.append(key_stat)
+ stats['key_details'] = key_stats
+ else:
+ stats['multi_key_enabled'] = False
+
+ return stats
+
+ def diagnose_custom_endpoint(self) -> Dict[str, Any]:
+ """Diagnose custom endpoint configuration for troubleshooting"""
+ diagnosis = {
+ 'timestamp': datetime.now().isoformat(),
+ 'model': self.model,
+ 'client_type': getattr(self, 'client_type', None),
+ 'multi_key_mode': getattr(self, '_multi_key_mode', False),
+ 'environment_variables': {
+ 'USE_CUSTOM_OPENAI_ENDPOINT': os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', 'not_set'),
+ 'OPENAI_CUSTOM_BASE_URL': os.getenv('OPENAI_CUSTOM_BASE_URL', 'not_set'),
+ 'OPENAI_API_BASE': os.getenv('OPENAI_API_BASE', 'not_set'),
+ },
+ 'client_status': {
+ 'openai_client_exists': hasattr(self, 'openai_client') and self.openai_client is not None,
+ 'gemini_client_exists': hasattr(self, 'gemini_client') and self.gemini_client is not None,
+ 'current_api_key_length': len(self.api_key) if hasattr(self, 'api_key') and self.api_key else 0,
+ }
+ }
+
+ # Check if custom endpoint should be applied
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+
+ diagnosis['custom_endpoint_analysis'] = {
+ 'toggle_enabled': use_custom_endpoint,
+ 'custom_url_provided': bool(custom_base_url),
+ 'should_use_custom_endpoint': use_custom_endpoint and bool(custom_base_url),
+ 'would_override_model_type': True, # With our fix, it always overrides now
+ }
+
+ # Determine if there are any issues
+ issues = []
+ if use_custom_endpoint and not custom_base_url:
+ issues.append("Custom endpoint enabled but no URL provided in OPENAI_CUSTOM_BASE_URL")
+ if custom_base_url and not use_custom_endpoint:
+ issues.append("Custom URL provided but toggle USE_CUSTOM_OPENAI_ENDPOINT is disabled")
+ if not openai and use_custom_endpoint:
+ issues.append("OpenAI library not installed - cannot use custom endpoints")
+
+ diagnosis['issues'] = issues
+ diagnosis['status'] = 'OK' if not issues else 'ISSUES_FOUND'
+
+ return diagnosis
+
+ def print_custom_endpoint_diagnosis(self):
+ """Print a user-friendly diagnosis of custom endpoint configuration"""
+ diagnosis = self.diagnose_custom_endpoint()
+
+ print("\n🔍 Custom OpenAI Endpoint Diagnosis:")
+ print(f" Model: {diagnosis['model']}")
+ print(f" Client Type: {diagnosis['client_type']}")
+ print(f" Multi-Key Mode: {diagnosis['multi_key_mode']}")
+ print("\n📋 Environment Variables:")
+ for key, value in diagnosis['environment_variables'].items():
+ print(f" {key}: {value}")
+
+ print("\n🔧 Custom Endpoint Analysis:")
+ analysis = diagnosis['custom_endpoint_analysis']
+ print(f" Toggle Enabled: {analysis['toggle_enabled']}")
+ print(f" Custom URL Provided: {analysis['custom_url_provided']}")
+ print(f" Should Use Custom Endpoint: {analysis['should_use_custom_endpoint']}")
+
+ if diagnosis['issues']:
+ print("\n⚠️ Issues Found:")
+ for issue in diagnosis['issues']:
+ print(f" • {issue}")
+ else:
+ print("\n✅ No configuration issues detected")
+
+ print(f"\n📊 Status: {diagnosis['status']}\n")
+
+ return diagnosis
+
+ def reset_stats(self):
+ """Reset usage statistics and pattern tracking"""
+ self.stats = {
+ 'total_requests': 0,
+ 'successful_requests': 0,
+ 'failed_requests': 0,
+ 'errors': defaultdict(int),
+ 'response_times': [],
+ 'empty_results': 0
+ }
+
+ # Reset pattern tracking
+ self.pattern_counts = {}
+ self.last_pattern = None
+
+ # Reset conversation tracking if not already set
+ if not hasattr(self, 'conversation_message_count'):
+ self.conversation_message_count = 0
+
+ # Log if logger is available
+ if hasattr(self, 'logger'):
+ self.logger.info("Statistics and pattern tracking reset")
+ else:
+ print("Statistics and pattern tracking reset")
+
+ def _rotate_to_next_available_key(self, skip_current: bool = False) -> bool:
+ """
+ Rotate to the next available key that's not rate limited
+
+ Args:
+ skip_current: If True, skip the current key even if it becomes available
+ """
+ if not self._multi_key_mode or not self._api_key_pool: # Use instance variable
+ return False
+
+ old_key_identifier = self.key_identifier
+ start_index = self._api_key_pool.current_index
+ max_attempts = len(self._api_key_pool.keys)
+ attempts = 0
+
+ while attempts < max_attempts:
+ # Get next key from pool
+ key_info = self._get_next_available_key()
+ if not key_info:
+ attempts += 1
+ continue
+
+ # Check if this is the same key we started with
+ potential_key_id = f"Key#{key_info[1]+1} ({key_info[0].model})"
+ if skip_current and potential_key_id == old_key_identifier:
+ attempts += 1
+ continue
+
+ # Check if this key is rate limited
+ if not self._rate_limit_cache.is_rate_limited(potential_key_id):
+ # This key is available, use it
+ self._apply_key_change(key_info, old_key_identifier)
+ return True
+ else:
+ print(f"[DEBUG] Skipping {potential_key_id} (in cooldown)")
+
+ attempts += 1
+
+ print(f"[DEBUG] No available keys found after checking all {max_attempts} keys")
+
+ # All keys are on cooldown - wait for shortest cooldown
+ wait_time = self._get_shortest_cooldown_time()
+ print(f"[DEBUG] All keys on cooldown. Waiting {wait_time}s...")
+
+ # Wait with cancellation check
+ for i in range(wait_time):
+ if hasattr(self, '_cancelled') and self._cancelled:
+ print(f"[DEBUG] Wait cancelled by user")
+ return False
+ time.sleep(1)
+ if i % 10 == 0 and i > 0:
+ print(f"[DEBUG] Still waiting... {wait_time - i}s remaining")
+
+ # Clear expired entries and try again
+ self._rate_limit_cache.clear_expired()
+
+ # Try one more time to find an available key
+ attempts = 0
+ while attempts < max_attempts:
+ key_info = self._get_next_available_key()
+ if key_info:
+ potential_key_id = f"Key#{key_info[1]+1} ({key_info[0].model})"
+ if not self._rate_limit_cache.is_rate_limited(potential_key_id):
+ self._apply_key_change(key_info, old_key_identifier)
+ return True
+ attempts += 1
+
+ return False
+
+ def _apply_key_change(self, key_info: tuple, old_key_identifier: str):
+ """Apply the key change and reinitialize clients"""
+ self.api_key = key_info[0].api_key
+ self.model = key_info[0].model
+ self.current_key_index = key_info[1]
+ self.key_identifier = f"Key#{key_info[1]+1} ({key_info[0].model})"
+
+ # MICROSECOND LOCK: Atomic update of all key-related variables
+ with self._model_lock:
+ self.api_key = key_info[0].api_key
+ self.model = key_info[0].model
+ self.current_key_index = key_info[1]
+ self.key_identifier = f"Key#{key_info[1]+1} ({key_info[0].model})"
+
+ # Reset clients atomically
+ self.openai_client = None
+ self.gemini_client = None
+ self.mistral_client = None
+ self.cohere_client = None
+
+ # Logging OUTSIDE the lock - FIX: Add None check
+ if self.api_key and len(self.api_key) > 8:
+ masked_key = self.api_key[:8] + "..." + self.api_key[-4:]
+ else:
+ masked_key = self.api_key or "***"
+ print(f"[DEBUG] 🔄 Switched from {old_key_identifier} to {self.key_identifier}")
+
+ # Reset clients
+ self.openai_client = None
+ self.gemini_client = None
+ self.mistral_client = None
+ self.cohere_client = None
+
+ # Re-setup the client with new key
+ self._setup_client()
+
+ # Re-apply custom endpoint if needed
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+
+ if custom_base_url and use_custom_endpoint and self.client_type == 'openai':
+ if not custom_base_url.startswith(('http://', 'https://')):
+ custom_base_url = 'https://' + custom_base_url
+
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=custom_base_url
+ )
+ print(f"[DEBUG] Re-created OpenAI client with custom base URL")
+
+ def _force_rotate_to_untried_key(self, attempted_keys: set) -> bool:
+ """
+ Force rotation to any key that hasn't been tried yet, ignoring cooldown
+
+ Args:
+ attempted_keys: Set of key identifiers that have already been attempted
+ """
+ if not self._multi_key_mode or not self._api_key_pool: # Use instance variable
+ return False
+
+ old_key_identifier = self.key_identifier
+
+ # Try each key in the pool
+ for i in range(len(self._api_key_pool.keys)):
+ key = self._api_key_pool.keys[i]
+ potential_key_id = f"Key#{i+1} ({key.model})"
+
+ # Skip if already tried
+ if potential_key_id in attempted_keys:
+ continue
+
+ # Found an untried key - use it regardless of cooldown
+ key_info = (key, i)
+ self._apply_key_change(key_info, old_key_identifier)
+ print(f"[DEBUG] 🔄 Force-rotated to untried key: {self.key_identifier}")
+ return True
+
+ return False
+
+ def get_current_key_info(self) -> str:
+ """Get information about the currently active key"""
+ if self._multi_key_mode and self.current_key_index is not None: # Use instance variable
+ key = self._api_key_pool.keys[self.current_key_index]
+ status = "Active" if key.is_available() else "Cooling Down"
+ return f"{self.key_identifier} - Status: {status}, Success: {key.success_count}, Errors: {key.error_count}"
+ else:
+ return "Single Key Mode"
+
+ def _generate_unique_thread_dir(self, context: str) -> str:
+ """Generate a truly unique thread directory with session ID and timestamp"""
+ thread_name = threading.current_thread().name
+ thread_id = threading.current_thread().ident
+
+ # Include timestamp and session ID for uniqueness
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")[:20]
+ unique_id = f"{thread_name}_{thread_id}_{self.session_id}_{timestamp}"
+
+ thread_dir = os.path.join("Payloads", context, unique_id)
+ os.makedirs(thread_dir, exist_ok=True)
+ return thread_dir
+
+ def _get_request_hash(self, messages) -> str:
+ """Generate a STABLE hash for request deduplication - THREAD-SAFE VERSION
+ WITH MICROSECOND LOCKING for thread safety."""
+
+ # MICROSECOND LOCK: Ensure atomic hash generation
+ with self._model_lock:
+ # Get thread-specific identifier to prevent cross-thread cache collisions
+ thread_id = threading.current_thread().ident
+ thread_name = threading.current_thread().name
+
+ # REMOVED: request_uuid, request_timestamp, request_timestamp_micro
+ # We want STABLE hashes for caching to work!
+
+ # Create normalized representation (can be done outside lock)
+ normalized_messages = []
+
+ for msg in messages:
+ normalized_msg = {
+ 'role': msg.get('role', ''),
+ 'content': msg.get('content', '')
+ }
+
+ # For image messages, include image size/hash instead of full data
+ if isinstance(normalized_msg['content'], list):
+ content_parts = []
+ for part in normalized_msg['content']:
+ if isinstance(part, dict) and 'image_url' in part:
+ # Hash the image data
+ image_data = part.get('image_url', {}).get('url', '')
+ if image_data.startswith('data:'):
+ # Extract just the data part
+ image_hash = hashlib.md5(image_data.encode()).hexdigest()
+ content_parts.append(f"image:{image_hash}")
+ else:
+ content_parts.append(f"image_url:{image_data}")
+ else:
+ content_parts.append(str(part))
+ normalized_msg['content'] = '|'.join(content_parts)
+
+ normalized_messages.append(normalized_msg)
+
+ # MICROSECOND LOCK: Ensure atomic hash generation
+ with self._model_lock:
+ # Include thread_id but NO request-specific IDs for stable caching
+ hash_data = {
+ 'thread_id': thread_id, # THREAD ISOLATION
+ 'thread_name': thread_name, # Additional context for debugging
+ # REMOVED: request_uuid, request_time, request_time_ns
+ 'messages': normalized_messages,
+ 'model': self.model,
+ 'temperature': getattr(self, 'temperature', 0.3),
+ 'max_tokens': getattr(self, 'max_tokens', 8192)
+ }
+
+ # Debug logging if needed
+ if os.getenv("DEBUG_HASH", "0") == "1":
+ print(f"[HASH] Thread: {thread_name} (ID: {thread_id})")
+ print(f"[HASH] Model: {self.model}")
+
+ # Create stable JSON representation
+ hash_str = json.dumps(hash_data, sort_keys=True, ensure_ascii=False)
+
+ # Use SHA256 for better distribution
+ final_hash = hashlib.sha256(hash_str.encode()).hexdigest()
+
+ if os.getenv("DEBUG_HASH", "0") == "1":
+ print(f"[HASH] Generated stable hash: {final_hash[:16]}...")
+
+ return final_hash
+
+ def _get_request_hash_with_context(self, messages, context=None) -> str:
+ """
+ Generate a STABLE hash that includes context AND thread info for better deduplication.
+ WITH MICROSECOND LOCKING for thread safety.
+ """
+
+ # MICROSECOND LOCK: Ensure atomic reading of model/settings
+ with self._model_lock:
+ # Get thread-specific identifier
+ thread_id = threading.current_thread().ident
+ thread_name = threading.current_thread().name
+
+ # REMOVED: request_uuid, request_timestamp, request_timestamp_micro
+ # We want STABLE hashes for caching to work!
+
+ # Create normalized representation (can be done outside lock)
+ normalized_messages = []
+
+ for msg in messages:
+ normalized_msg = {
+ 'role': msg.get('role', ''),
+ 'content': msg.get('content', '')
+ }
+
+ # Handle image messages
+ if isinstance(normalized_msg['content'], list):
+ content_parts = []
+ for part in normalized_msg['content']:
+ if isinstance(part, dict) and 'image_url' in part:
+ image_data = part.get('image_url', {}).get('url', '')
+ if image_data.startswith('data:'):
+ # Use first 1000 chars of image data for hash
+ image_sample = image_data[:1000]
+ image_hash = hashlib.md5(image_sample.encode()).hexdigest()
+ content_parts.append(f"image:{image_hash}")
+ else:
+ content_parts.append(f"image_url:{image_data}")
+ elif isinstance(part, dict):
+ content_parts.append(json.dumps(part, sort_keys=True))
+ else:
+ content_parts.append(str(part))
+ normalized_msg['content'] = '|'.join(content_parts)
+
+ normalized_messages.append(normalized_msg)
+
+ # MICROSECOND LOCK: Ensure atomic hash generation
+ with self._instance_model_lock:
+ # Include context, thread info, but NO request-specific IDs
+ hash_data = {
+ 'thread_id': thread_id, # THREAD ISOLATION
+ 'thread_name': thread_name, # Additional thread context
+ # REMOVED: request_uuid, request_time, request_time_ns
+ 'context': context, # Include context (e.g., 'translation', 'glossary', etc.)
+ 'messages': normalized_messages,
+ 'model': self.model,
+ 'temperature': getattr(self, 'temperature', 0.3),
+ 'max_tokens': getattr(self, 'max_tokens', 8192)
+ }
+
+ # Debug logging if needed
+ if os.getenv("DEBUG_HASH", "0") == "1":
+ print(f"[HASH_CONTEXT] Thread: {thread_name} (ID: {thread_id})")
+ print(f"[HASH_CONTEXT] Context: {context}")
+ print(f"[HASH_CONTEXT] Model: {self.model}")
+
+ # Create stable JSON representation
+ hash_str = json.dumps(hash_data, sort_keys=True, ensure_ascii=False)
+
+ # Use SHA256 for better distribution
+ final_hash = hashlib.sha256(hash_str.encode()).hexdigest()
+
+ if os.getenv("DEBUG_HASH", "0") == "1":
+ print(f"[HASH_CONTEXT] Generated stable hash: {final_hash[:16]}...")
+
+ return final_hash
+
+ def _get_unique_file_suffix(self, attempt: int = 0) -> str:
+ """Generate a unique suffix for file names to prevent overwrites
+ WITH MICROSECOND LOCKING for thread safety."""
+
+ # MICROSECOND LOCK: Ensure atomic generation of unique identifiers
+ with self._instance_model_lock:
+ thread_id = threading.current_thread().ident
+ timestamp = datetime.now().strftime("%H%M%S%f")[:10]
+ request_uuid = str(uuid.uuid4())[:8]
+
+ # Create unique suffix for files
+ suffix = f"_T{thread_id}_A{attempt}_{timestamp}_{request_uuid}"
+
+ return suffix
+
+ def _get_request_hash_with_request_id(self, messages, request_id: str) -> str:
+ """Generate hash WITH request ID for per-call caching
+ WITH MICROSECOND LOCKING for thread safety."""
+
+ # MICROSECOND LOCK: Ensure atomic hash generation
+ with self._instance_model_lock:
+ thread_id = threading.current_thread().ident
+ thread_name = threading.current_thread().name
+
+ # Create normalized representation
+ normalized_messages = []
+
+ for msg in messages:
+ normalized_msg = {
+ 'role': msg.get('role', ''),
+ 'content': msg.get('content', '')
+ }
+
+ # For image messages, include image size/hash instead of full data
+ if isinstance(normalized_msg['content'], list):
+ content_parts = []
+ for part in normalized_msg['content']:
+ if isinstance(part, dict) and 'image_url' in part:
+ image_data = part.get('image_url', {}).get('url', '')
+ if image_data.startswith('data:'):
+ image_hash = hashlib.md5(image_data.encode()).hexdigest()
+ content_parts.append(f"image:{image_hash}")
+ else:
+ content_parts.append(f"image_url:{image_data}")
+ else:
+ content_parts.append(str(part))
+ normalized_msg['content'] = '|'.join(content_parts)
+
+ normalized_messages.append(normalized_msg)
+
+ # MICROSECOND LOCK: Ensure atomic hash generation
+ with self._instance_model_lock:
+ hash_data = {
+ 'thread_id': thread_id,
+ 'thread_name': thread_name,
+ 'request_id': request_id, # THIS MAKES EACH send() CALL UNIQUE
+ 'messages': normalized_messages,
+ 'model': self.model,
+ 'temperature': getattr(self, 'temperature', 0.3),
+ 'max_tokens': getattr(self, 'max_tokens', 8192)
+ }
+
+ if os.getenv("DEBUG_HASH", "0") == "1":
+ print(f"[HASH] Thread: {thread_name} (ID: {thread_id})")
+ print(f"[HASH] Request ID: {request_id}") # Debug the request ID
+ print(f"[HASH] Model: {self.model}")
+
+ hash_str = json.dumps(hash_data, sort_keys=True, ensure_ascii=False)
+ final_hash = hashlib.sha256(hash_str.encode()).hexdigest()
+
+ if os.getenv("DEBUG_HASH", "0") == "1":
+ print(f"[HASH] Generated hash for request {request_id}: {final_hash[:16]}...")
+
+ return final_hash
+
+ def _check_duplicate_request(self, request_hash: str, context: str) -> bool:
+ """
+ Enhanced duplicate detection that properly handles parallel requests.
+ Returns True only if this exact request is actively being processed.
+ """
+
+ # Only check for duplicates in specific contexts
+ if context not in ['translation', 'glossary', 'image_translation']:
+ return False
+
+ thread_name = threading.current_thread().name
+
+ # This method is now deprecated in favor of the active_requests tracking
+ # We keep it for backward compatibility but it just returns False
+ # The real duplicate detection happens in the send() method using _active_requests
+ return False
+
+ def _debug_active_requests(self):
+ """Debug method to show current active requests"""
+ pass
+
+ def _ensure_thread_safety_init(self):
+ """
+ Ensure all thread safety structures are properly initialized.
+ Call this during __init__ or before parallel processing.
+ """
+
+ # Thread-local storage
+ if not hasattr(self, '_thread_local'):
+ self._thread_local = threading.local()
+
+ # File operation locks
+ if not hasattr(self, '_file_write_locks'):
+ self._file_write_locks = {}
+ if not hasattr(self, '_file_write_locks_lock'):
+ self._file_write_locks_lock = RLock()
+
+ # Legacy tracker (for backward compatibility)
+ if not hasattr(self, '_tracker_lock'):
+ self._tracker_lock = RLock()
+
+ def _periodic_cache_cleanup(self):
+ """
+ Periodically clean up expired cache entries and active requests.
+ Should be called periodically or scheduled with a timer.
+ """
+ pass
+
+ def _get_thread_status(self) -> dict:
+ """
+ Get current status of thread-related structures for debugging.
+ """
+ status = {
+ 'thread_name': threading.current_thread().name,
+ 'thread_id': threading.current_thread().ident,
+ 'cache_size': len(self._request_cache) if hasattr(self, '_request_cache') else 0,
+ 'active_requests': len(self._active_requests) if hasattr(self, '_active_requests') else 0,
+ 'multi_key_mode': self._multi_key_mode,
+ 'current_key': self.key_identifier if hasattr(self, 'key_identifier') else 'Unknown'
+ }
+
+ # Add thread-local info if available
+ if hasattr(self, '_thread_local'):
+ tls = self._get_thread_local_client()
+ status['thread_local'] = {
+ 'initialized': getattr(tls, 'initialized', False),
+ 'key_index': getattr(tls, 'key_index', None),
+ 'request_count': getattr(tls, 'request_count', 0)
+ }
+
+ return status
+
+ def cleanup(self):
+ """
+ Enhanced cleanup method to properly release all resources.
+ Should be called when done with the client or on shutdown.
+ """
+ thread_name = threading.current_thread().name
+ logger.info(f"[{thread_name}] Cleaning up UnifiedClient resources")
+
+ # Release thread key assignment if in multi-key mode
+ if self._multi_key_mode and self._api_key_pool:
+ thread_id = threading.current_thread().ident
+ self._api_key_pool.release_thread_assignment(thread_id)
+
+ # Clear thread-local storage
+ if hasattr(self, '_thread_local'):
+ # Reset thread-local state
+ self._thread_local.initialized = False
+ self._thread_local.api_key = None
+ self._thread_local.model = None
+ self._thread_local.key_index = None
+ self._thread_local.request_count = 0
+
+ logger.info(f"[{thread_name}] Cleanup complete")
+
+ def _get_safe_filename(self, base_filename: str, content_hash: str = None) -> str:
+ """Generate a safe, unique filename"""
+ # Add content hash if provided
+ if content_hash:
+ name, ext = os.path.splitext(base_filename)
+ return f"{name}_{content_hash[:8]}{ext}"
+ return base_filename
+
+ def _is_file_being_written(self, filepath: str) -> bool:
+ """Check if a file is currently being written by another thread"""
+ with self._file_lock:
+ return filepath in self._active_files
+
+ def _mark_file_active(self, filepath: str):
+ """Mark a file as being written"""
+ with self._file_lock:
+ self._active_files.add(filepath)
+
+ def _mark_file_complete(self, filepath: str):
+ """Mark a file write as complete"""
+ with self._file_lock:
+ self._active_files.discard(filepath)
+
+ def _extract_chapter_info(self, messages) -> dict:
+ """Extract chapter and chunk information from messages and progress file
+
+ Args:
+ messages: The messages to search for chapter/chunk info
+
+ Returns:
+ dict with 'chapter', 'chunk', 'total_chunks'
+ """
+ info = {
+ 'chapter': None,
+ 'chunk': None,
+ 'total_chunks': None
+ }
+
+ messages_str = str(messages)
+
+ # First extract chapter number from messages
+ chapter_match = re.search(r'Chapter\s+(\d+)', messages_str, re.IGNORECASE)
+ if not chapter_match:
+ # Try Section pattern for text files
+ chapter_match = re.search(r'Section\s+(\d+)', messages_str, re.IGNORECASE)
+
+ if chapter_match:
+ chapter_num = int(chapter_match.group(1))
+ info['chapter'] = str(chapter_num)
+
+ # Now try to get more accurate info from progress file
+ # Look for translation_progress.json in common locations
+ possible_paths = [
+ 'translation_progress.json',
+ os.path.join('Payloads', 'translation_progress.json'),
+ os.path.join(os.getcwd(), 'Payloads', 'translation_progress.json')
+ ]
+
+ # Check environment variable for output directory
+ output_dir = os.getenv('OUTPUT_DIRECTORY', '')
+ if output_dir:
+ possible_paths.insert(0, os.path.join(output_dir, 'translation_progress.json'))
+
+ progress_file = None
+ for path in possible_paths:
+ if os.path.exists(path):
+ progress_file = path
+ break
+
+ if progress_file:
+ try:
+ with open(progress_file, 'r', encoding='utf-8') as f:
+ prog = json.load(f)
+
+ # Look through chapters for matching actual_num
+ for chapter_key, chapter_info in prog.get("chapters", {}).items():
+ if chapter_info.get('actual_num') == chapter_num:
+ # Found it! Get chunk info if available
+ if chapter_key in prog.get("chapter_chunks", {}):
+ chunk_data = prog["chapter_chunks"][chapter_key]
+ info['total_chunks'] = chunk_data.get('total')
+
+ # Get current/latest chunk
+ completed = chunk_data.get('completed', [])
+ if completed:
+ info['chunk'] = str(max(completed) + 1) # Next chunk to process
+ else:
+ info['chunk'] = '1' # First chunk
+ break
+ except:
+ pass # Fallback to regex parsing
+
+ # If we didn't get chunk info from progress file, try regex
+ if not info['chunk']:
+ chunk_match = re.search(r'Chunk\s+(\d+)/(\d+)', messages_str)
+ if chunk_match:
+ info['chunk'] = chunk_match.group(1)
+ info['total_chunks'] = chunk_match.group(2)
+
+ return info
+
+
+ def get_current_key_info(self) -> str:
+ """Get information about the currently active key"""
+ if self.use_multi_keys and self.current_key_index is not None:
+ key = self._api_key_pool.keys[self.current_key_index]
+ status = "Active" if key.is_available() else "Cooling Down"
+ return f"{self.key_identifier} - Status: {status}, Success: {key.success_count}, Errors: {key.error_count}"
+ else:
+ return "Single Key Mode"
+
+ def _should_rotate(self) -> bool:
+ """Check if we should rotate keys based on settings"""
+ if not self.use_multi_keys:
+ return False
+
+ if not self._force_rotation:
+ # Only rotate on errors
+ return False
+
+ # Check frequency
+ with self._counter_lock:
+ self._request_counter += 1
+
+ # Check if it's time to rotate
+ if self._request_counter >= self._rotation_frequency:
+ self._request_counter = 0
+ return True
+ else:
+ return False
+
+ def _get_shortest_cooldown_time(self) -> int:
+ """Get the shortest cooldown time among all keys"""
+ # Check if cancelled at start
+ if self._cancelled:
+ return 0 # Return immediately if cancelled
+
+ if not self._multi_key_mode or not self.__class__._api_key_pool:
+ return 60 # Default cooldown
+
+ min_cooldown = float('inf')
+ now = time.time()
+
+ for i, key in enumerate(self.__class__._api_key_pool.keys):
+ if key.enabled:
+ key_id = f"Key#{i+1} ({key.model})"
+
+ # Check rate limit cache
+ cache_cooldown = self.__class__._rate_limit_cache.get_remaining_cooldown(key_id)
+ if cache_cooldown > 0:
+ min_cooldown = min(min_cooldown, cache_cooldown)
+
+ # Also check key's own cooldown
+ if key.is_cooling_down and key.last_error_time:
+ remaining = key.cooldown - (now - key.last_error_time)
+ if remaining > 0:
+ min_cooldown = min(min_cooldown, remaining)
+
+ # Add random jitter to prevent thundering herd (0-5 seconds)
+ jitter = random.randint(0, 5)
+
+ # Return the minimum wait time plus jitter, capped at 60 seconds
+ base_time = int(min_cooldown) if min_cooldown != float('inf') else 30
+ return min(base_time + jitter, 60)
+
+ def _execute_with_retry(self,
+ perform,
+ messages,
+ temperature,
+ max_tokens,
+ max_completion_tokens,
+ context,
+ request_id,
+ is_image: bool = False) -> Tuple[str, Optional[str]]:
+ """
+ Simplified shim retained for compatibility. Executes once without internal retry.
+ """
+ result = perform(messages, temperature, max_tokens, max_completion_tokens, context, None, request_id)
+ if not result or not isinstance(result, tuple):
+ raise UnifiedClientError("Invalid result from perform()", error_type="unexpected")
+ return result
+ def _send_core(self,
+ messages,
+ temperature: Optional[float] = None,
+ max_tokens: Optional[int] = None,
+ max_completion_tokens: Optional[int] = None,
+ context: Optional[str] = None,
+ image_data: Any = None) -> Tuple[str, Optional[str]]:
+ """
+ Unified front for send and send_image. Includes multi-key retry wrapper.
+ """
+ batch_mode = os.getenv("BATCH_TRANSLATION", "0") == "1"
+ if not batch_mode:
+ self._sequential_send_lock.acquire()
+ try:
+ self.reset_cleanup_state()
+ # Pre-stagger log so users see what's being sent before delay
+ self._log_pre_stagger(messages, context or ('image_translation' if image_data else 'translation'))
+ self._apply_thread_submission_delay()
+ request_id = str(uuid.uuid4())[:8]
+
+ # Multi-key retry wrapper
+ if self._multi_key_mode:
+ # Check if indefinite retry is enabled for multi-key mode too
+ indefinite_retry_enabled = os.getenv("INDEFINITE_RATE_LIMIT_RETRY", "1") == "1"
+ last_error = None
+ attempt = 0
+
+ while True: # Indefinite retry loop when enabled
+ try:
+ if image_data is None:
+ return self._send_internal(messages, temperature, max_tokens, max_completion_tokens, context, retry_reason=None, request_id=request_id)
+ else:
+ return self._send_image_internal(messages, image_data, temperature, max_tokens, max_completion_tokens, context, retry_reason=None, request_id=request_id)
+
+ except UnifiedClientError as e:
+ last_error = e
+
+ # Handle rate limit errors with key rotation
+ if e.error_type == "rate_limit" or self._is_rate_limit_error(e):
+ attempt += 1
+
+ if indefinite_retry_enabled:
+ print(f"🔄 Multi-key mode: Rate limit hit, attempting key rotation (indefinite retry, attempt {attempt})")
+ else:
+ # Limited retry mode - respect max attempts per key
+ num_keys = len(self._api_key_pool.keys) if self._api_key_pool else 3
+ max_attempts = num_keys * 2 # Allow 2 attempts per key
+ print(f"🔄 Multi-key mode: Rate limit hit, attempting key rotation (attempt {attempt}/{max_attempts})")
+
+ if attempt >= max_attempts:
+ print(f"❌ Multi-key mode: Exhausted {max_attempts} attempts, giving up")
+ raise
+
+ try:
+ # Rotate to next key
+ self._handle_rate_limit_for_thread()
+ print(f"🔄 Multi-key retry: Key rotation completed, preparing for next attempt...")
+ time.sleep(0.1) # Brief pause after key rotation for system stability
+
+ # Check if we have any available keys left after rotation
+ available_keys = self._count_available_keys()
+ if available_keys == 0:
+ print(f"🔄 Multi-key mode: All keys rate-limited, waiting for cooldown...")
+ # Wait a bit before trying again
+ wait_time = min(60 + random.uniform(1, 10), 120) # 60-70 seconds
+ print(f"🔄 Multi-key mode: Waiting {wait_time:.1f}s for keys to cool down")
+
+ wait_start = time.time()
+ while time.time() - wait_start < wait_time:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+
+ # Continue to next attempt with rotated key
+ continue
+
+ except Exception as rotation_error:
+ print(f"❌ Multi-key mode: Key rotation failed: {rotation_error}")
+ # If rotation fails, we can't continue with multi-key retry
+ if indefinite_retry_enabled:
+ # In indefinite mode, try to continue with any available key
+ print(f"🔄 Multi-key mode: Key rotation failed, but indefinite retry enabled - continuing...")
+ time.sleep(5) # Brief pause before trying again
+ continue
+ else:
+ break
+ else:
+ # Non-rate-limit error, don't retry with different keys
+ raise
+
+ # This point is only reached in non-indefinite mode when giving up
+ if last_error:
+ print(f"❌ Multi-key mode: All retry attempts failed")
+ raise last_error
+ else:
+ raise UnifiedClientError("All multi-key attempts failed", error_type="no_keys")
+ else:
+ # Single key mode - direct call
+ if image_data is None:
+ return self._send_internal(messages, temperature, max_tokens, max_completion_tokens, context, retry_reason=None, request_id=request_id)
+ else:
+ return self._send_image_internal(messages, image_data, temperature, max_tokens, max_completion_tokens, context, retry_reason=None, request_id=request_id)
+ finally:
+ if not batch_mode:
+ self._sequential_send_lock.release()
+
+ def _get_thread_assigned_key(self) -> Optional[int]:
+ """Get the key index assigned to current thread"""
+ thread_id = threading.current_thread().ident
+
+ with self._key_assignment_lock:
+ if thread_id in self._thread_key_assignments:
+ key_index, timestamp = self._thread_key_assignments[thread_id]
+ # Check if assignment is still valid (not expired)
+ if time.time() - timestamp < 300: # 5 minute expiry
+ return key_index
+ else:
+ # Expired, remove it
+ del self._thread_key_assignments[thread_id]
+
+ return None
+
+ def _assign_key_to_thread(self, key_index: int):
+ """Assign a key to the current thread"""
+ thread_id = threading.current_thread().ident
+
+ with self._key_assignment_lock:
+ self._thread_key_assignments[thread_id] = (key_index, time.time())
+
+ # Cleanup old assignments
+ current_time = time.time()
+ expired_threads = [
+ tid for tid, (_, ts) in self._thread_key_assignments.items()
+ if current_time - ts > 300
+ ]
+ for tid in expired_threads:
+ del self._thread_key_assignments[tid]
+
+
+ def _setup_client(self):
+ """Setup the appropriate client based on model type"""
+ model_lower = self.model.lower()
+ tls = self._get_thread_local_client()
+
+ # Determine client_type (no lock needed, just reading)
+ self.client_type = None
+ for prefix, provider in self.MODEL_PROVIDERS.items():
+ if model_lower.startswith(prefix):
+ self.client_type = provider
+ break
+
+ # Check if we're using a custom OpenAI base URL
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', os.getenv('OPENAI_API_BASE', ''))
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+
+ # Apply custom endpoint logic when enabled - override any model type (except Gemini which has its own toggle)
+ if custom_base_url and custom_base_url != 'https://api.openai.com/v1' and use_custom_endpoint:
+ if not self.client_type:
+ # No prefix matched - assume it's a custom model that should use OpenAI endpoint
+ self.client_type = 'openai'
+ logger.info(f"Using OpenAI client for custom endpoint with unmatched model: {self.model}")
+ elif self.client_type == 'openai':
+ logger.info(f"Using custom OpenAI endpoint for OpenAI model: {self.model}")
+ elif self.client_type == 'gemini':
+ # Don't override Gemini - it has its own separate endpoint toggle
+ # Only log if Gemini OpenAI endpoint is not also enabled
+ use_gemini_endpoint = os.getenv("USE_GEMINI_OPENAI_ENDPOINT", "0") == "1"
+ if not use_gemini_endpoint:
+ self._log_once(f"Gemini model detected, not overriding with custom OpenAI endpoint (use USE_GEMINI_OPENAI_ENDPOINT instead)")
+ else:
+ # Override other model types to use custom OpenAI endpoint when toggle is enabled
+ original_client_type = self.client_type
+ self.client_type = 'openai'
+ print(f"[DEBUG] Custom endpoint override: {original_client_type} -> openai for model '{self.model}'")
+ logger.info(f"Custom endpoint enabled: Overriding {original_client_type} model {self.model} to use OpenAI client")
+ elif not use_custom_endpoint and custom_base_url and self.client_type == 'openai':
+ logger.info("Custom OpenAI endpoint disabled via toggle, using default endpoint")
+
+ # If still no client type, show error with suggestions
+ if not self.client_type:
+ # Provide helpful suggestions
+ suggestions = []
+ for prefix in self.MODEL_PROVIDERS.keys():
+ if prefix in model_lower or model_lower[:3] in prefix:
+ suggestions.append(prefix)
+
+ error_msg = f"Unsupported model: {self.model}. "
+ if suggestions:
+ error_msg += f"Did you mean to use one of these prefixes? {suggestions}. "
+ else:
+ # Check if it might be an aggregator model
+ if any(provider in model_lower for provider in ['yi', 'qwen', 'llama', 'gpt', 'claude']):
+ error_msg += f"If using ElectronHub, prefix with 'eh/' (e.g., eh/{self.model}). "
+ error_msg += f"If using OpenRouter, prefix with 'or/' (e.g., or/{self.model}). "
+ error_msg += f"If using Poe, prefix with 'poe/' (e.g., poe/{self.model}). "
+ error_msg += f"Supported prefixes: {list(self.MODEL_PROVIDERS.keys())}"
+ raise ValueError(error_msg)
+
+ # Initialize variables at method scope for all client types
+ base_url = None
+ use_gemini_endpoint = False
+ gemini_endpoint = ""
+
+ # Prepare provider-specific settings (but don't create clients yet)
+ if self.client_type == 'openai':
+ if openai is None:
+ raise ImportError("OpenAI library not installed. Install with: pip install openai")
+
+ elif self.client_type == 'gemini':
+ # Check if we should use OpenAI-compatible endpoint for Gemini
+ use_gemini_endpoint = os.getenv("USE_GEMINI_OPENAI_ENDPOINT", "0") == "1"
+ gemini_endpoint = os.getenv("GEMINI_OPENAI_ENDPOINT", "")
+
+ if use_gemini_endpoint and gemini_endpoint:
+ # Use OpenAI client for Gemini with custom endpoint
+ #print(f"[DEBUG] Preparing Gemini with OpenAI-compatible endpoint")
+ pass
+ if openai is None:
+ raise ImportError("OpenAI library not installed. Install with: pip install openai")
+
+ # Ensure endpoint has proper format
+ if not gemini_endpoint.endswith('/openai/'):
+ if gemini_endpoint.endswith('/'):
+ gemini_endpoint = gemini_endpoint + 'openai/'
+ else:
+ gemini_endpoint = gemini_endpoint + '/openai/'
+
+ # Set base_url for Gemini OpenAI endpoint
+ base_url = gemini_endpoint
+
+ print(f"[DEBUG] Gemini will use OpenAI-compatible endpoint: {gemini_endpoint}")
+
+ disable_safety = os.getenv("DISABLE_GEMINI_SAFETY", "false").lower() == "true"
+
+ config_data = {
+ "type": "GEMINI_OPENAI_ENDPOINT_REQUEST",
+ "model": self.model,
+ "endpoint": gemini_endpoint,
+ "safety_enabled": not disable_safety,
+ "safety_settings": "DISABLED_VIA_OPENAI_ENDPOINT" if disable_safety else "DEFAULT",
+ "timestamp": datetime.now().isoformat(),
+ }
+
+ # Just call the existing save method
+ self._save_gemini_safety_config(config_data, None)
+ else:
+ # Use native Gemini client
+ #print(f"[DEBUG] Preparing native Gemini client")
+ if not GENAI_AVAILABLE:
+ raise ImportError(
+ "Google Gen AI library not installed. Install with: "
+ "pip install google-genai"
+ )
+
+ elif self.client_type == 'electronhub':
+ # ElectronHub uses OpenAI SDK if available
+ if openai is not None:
+ logger.info("ElectronHub will use OpenAI SDK for API calls")
+ else:
+ logger.info("ElectronHub will use HTTP API for API calls")
+
+ elif self.client_type == 'chutes':
+ # chutes uses OpenAI-compatible endpoint
+ if openai is not None:
+ chutes_base_url = os.getenv("CHUTES_API_URL", "https://llm.chutes.ai/v1")
+
+ # MICROSECOND LOCK for chutes client
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=chutes_base_url
+ )
+ logger.info(f"chutes client configured with endpoint: {chutes_base_url}")
+ else:
+ logger.info("chutes will use HTTP API")
+
+ elif self.client_type == 'mistral':
+ if MistralClient is None:
+ # Fall back to HTTP API if SDK not installed
+ logger.info("Mistral SDK not installed, will use HTTP API")
+
+ elif self.client_type == 'cohere':
+ if cohere is None:
+ logger.info("Cohere SDK not installed, will use HTTP API")
+
+ elif self.client_type == 'anthropic':
+ if anthropic is None:
+ logger.info("Anthropic SDK not installed, will use HTTP API")
+ else:
+ # Store API key for HTTP fallback
+ self.anthropic_api_key = self.api_key
+ logger.info("Anthropic client configured")
+
+ elif self.client_type == 'deepseek':
+ # DeepSeek typically uses OpenAI-compatible endpoint
+ if openai is None:
+ logger.info("DeepSeek will use HTTP API")
+ else:
+ base_url = os.getenv("DEEPSEEK_API_URL", "https://api.deepseek.com/v1")
+ logger.info(f"DeepSeek will use endpoint: {base_url}")
+
+ elif self.client_type == 'groq':
+ # Groq uses OpenAI-compatible endpoint
+ if openai is None:
+ logger.info("Groq will use HTTP API")
+ else:
+ base_url = os.getenv("GROQ_API_URL", "https://api.groq.com/openai/v1")
+ logger.info(f"Groq will use endpoint: {base_url}")
+
+ elif self.client_type == 'fireworks':
+ # Fireworks uses OpenAI-compatible endpoint
+ if openai is None:
+ logger.info("Fireworks will use HTTP API")
+ else:
+ base_url = os.getenv("FIREWORKS_API_URL", "https://api.fireworks.ai/inference/v1")
+ logger.info(f"Fireworks will use endpoint: {base_url}")
+
+ elif self.client_type == 'xai':
+ # xAI (Grok) uses OpenAI-compatible endpoint
+ if openai is None:
+ logger.info("xAI will use HTTP API")
+ else:
+ base_url = os.getenv("XAI_API_URL", "https://api.x.ai/v1")
+ logger.info(f"xAI will use endpoint: {base_url}")
+
+ # =====================================================
+ # MICROSECOND LOCK: Create ALL clients with thread safety
+ # =====================================================
+
+ if self.client_type == 'openai':
+ # Skip if individual endpoint already applied
+ if hasattr(self, '_individual_endpoint_applied') and self._individual_endpoint_applied:
+ return
+
+ # MICROSECOND LOCK for OpenAI client
+ with self._model_lock:
+ # Use regular OpenAI client - individual endpoint will be set later
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url='https://api.openai.com/v1' # Default, will be overridden by individual endpoint
+ )
+
+ elif self.client_type == 'gemini':
+ if use_gemini_endpoint and gemini_endpoint:
+ # Use OpenAI client for Gemini endpoint
+ if base_url is None:
+ base_url = gemini_endpoint
+
+ # MICROSECOND LOCK for Gemini with OpenAI endpoint
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=base_url
+ )
+ self._original_client_type = 'gemini'
+ self.client_type = 'openai'
+ print(f"[DEBUG] Gemini using OpenAI-compatible endpoint: {base_url}")
+ else:
+ # MICROSECOND LOCK for native Gemini client
+ # Check if this key has Google credentials (multi-key mode)
+ google_creds = None
+ if hasattr(self, 'current_key_google_creds') and self.current_key_google_creds:
+ google_creds = self.current_key_google_creds
+ print(f"[DEBUG] Using key-specific Google credentials: {os.path.basename(google_creds)}")
+ # Set environment variable for this request
+ os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = google_creds
+ elif hasattr(self, 'google_creds_path') and self.google_creds_path:
+ google_creds = self.google_creds_path
+ print(f"[DEBUG] Using default Google credentials: {os.path.basename(google_creds)}")
+
+ with self._model_lock:
+ self.gemini_client = genai.Client(api_key=self.api_key)
+ if hasattr(tls, 'model'):
+ tls.gemini_configured = True
+ tls.gemini_api_key = self.api_key
+ tls.gemini_client = self.gemini_client
+
+ #print(f"[DEBUG] Created native Gemini client for model: {self.model}")
+
+ elif self.client_type == 'mistral':
+ if MistralClient is not None:
+ # MICROSECOND LOCK for Mistral client
+ if hasattr(self, '_instance_model_lock'):
+ with self._instance_model_lock:
+ self.mistral_client = MistralClient(api_key=self.api_key)
+ else:
+ self.mistral_client = MistralClient(api_key=self.api_key)
+ logger.info("Mistral client created")
+
+ elif self.client_type == 'cohere':
+ if cohere is not None:
+ # MICROSECOND LOCK for Cohere client
+ with self._model_lock:
+ self.cohere_client = cohere.Client(self.api_key)
+ logger.info("Cohere client created")
+
+ elif self.client_type == 'deepseek':
+ if openai is not None:
+ if base_url is None:
+ base_url = os.getenv("DEEPSEEK_API_URL", "https://api.deepseek.com/v1")
+
+ # MICROSECOND LOCK for DeepSeek client
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=base_url
+ )
+ logger.info(f"DeepSeek client configured with endpoint: {base_url}")
+
+ elif self.client_type == 'groq':
+ if openai is not None:
+ if base_url is None:
+ base_url = os.getenv("GROQ_API_URL", "https://api.groq.com/openai/v1")
+
+ # MICROSECOND LOCK for Groq client
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=base_url
+ )
+ logger.info(f"Groq client configured with endpoint: {base_url}")
+
+ elif self.client_type == 'fireworks':
+ if openai is not None:
+ if base_url is None:
+ base_url = os.getenv("FIREWORKS_API_URL", "https://api.fireworks.ai/inference/v1")
+
+ # MICROSECOND LOCK for Fireworks client
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=base_url
+ )
+ logger.info(f"Fireworks client configured with endpoint: {base_url}")
+
+ elif self.client_type == 'xai':
+ if openai is not None:
+ if base_url is None:
+ base_url = os.getenv("XAI_API_URL", "https://api.x.ai/v1")
+
+ # MICROSECOND LOCK for xAI client
+ with self._model_lock:
+ self.openai_client = openai.OpenAI(
+ api_key=self.api_key,
+ base_url=base_url
+ )
+ logger.info(f"xAI client configured with endpoint: {base_url}")
+
+ elif self.client_type == 'deepl' or self.model.startswith('deepl'):
+ self.client_type = 'deepl'
+ self.client = None # No persistent client needed
+ return
+
+ elif self.client_type == 'google_translate' or self.model.startswith('google-translate'):
+ self.client_type = 'google_translate'
+ self.client = None # No persistent client needed
+ return
+
+ elif self.client_type == 'vertex_model_garden':
+ # Vertex AI doesn't need a client created here
+ logger.info("Vertex AI Model Garden will initialize on demand")
+
+ elif self.client_type in ['yi', 'qwen', 'baichuan', 'zhipu', 'moonshot', 'baidu',
+ 'tencent', 'iflytek', 'bytedance', 'minimax',
+ 'sensenova', 'internlm', 'tii', 'microsoft',
+ 'azure', 'google', 'alephalpha', 'databricks',
+ 'huggingface', 'salesforce', 'bigscience', 'meta',
+ 'electronhub', 'poe', 'openrouter', 'chutes']:
+ # These providers will use HTTP API or OpenAI-compatible endpoints
+ # No client initialization needed here
+ logger.info(f"{self.client_type} will use HTTP API or compatible endpoint")
+
+ # Store thread-local client reference if in multi-key mode
+ if self._multi_key_mode and hasattr(tls, 'model'):
+ # MICROSECOND LOCK for thread-local storage
+ with self._model_lock:
+ tls.client_type = self.client_type
+ if hasattr(self, 'openai_client'):
+ tls.openai_client = self.openai_client
+ if hasattr(self, 'gemini_client'):
+ tls.gemini_client = self.gemini_client
+ if hasattr(self, 'mistral_client'):
+ tls.mistral_client = self.mistral_client
+ if hasattr(self, 'cohere_client'):
+ tls.cohere_client = self.cohere_client
+ else:
+ tls.client_type = self.client_type
+ if hasattr(self, 'openai_client'):
+ tls.openai_client = self.openai_client
+ if hasattr(self, 'gemini_client'):
+ tls.gemini_client = self.gemini_client
+ if hasattr(self, 'mistral_client'):
+ tls.mistral_client = self.mistral_client
+ if hasattr(self, 'cohere_client'):
+ tls.cohere_client = self.cohere_client
+
+ # Log retry feature support
+ logger.info(f"✅ Initialized {self.client_type} client for model: {self.model}")
+ logger.debug("✅ GUI retry features supported: truncation detection, timeout handling, duplicate detection")
+
+ def send(self, messages, temperature=None, max_tokens=None,
+ max_completion_tokens=None, context=None) -> Tuple[str, Optional[str]]:
+ """Backwards-compatible public API; now delegates to unified _send_core."""
+ return self._send_core(messages, temperature, max_tokens, max_completion_tokens, context, image_data=None)
+
+ def _send_internal(self, messages, temperature=None, max_tokens=None,
+ max_completion_tokens=None, context=None, retry_reason=None,
+ request_id=None, image_data=None) -> Tuple[str, Optional[str]]:
+ """
+ Unified internal send implementation for both text and image requests.
+ Pass image_data=None for text requests, or image bytes/base64 for image requests.
+ """
+ # Determine if this is an image request
+ is_image_request = image_data is not None
+
+ # Use appropriate context default
+ if context is None:
+ context = 'image_translation' if is_image_request else 'translation'
+
+ # Always ensure per-request key assignment/rotation for multi-key mode
+ # This guarantees forced rotation when rotation_frequency == 1
+ if getattr(self, '_multi_key_mode', False):
+ try:
+ self._ensure_thread_client()
+ except UnifiedClientError:
+ # Propagate known client errors
+ raise
+ except Exception as e:
+ # Normalize unexpected failures
+ raise UnifiedClientError(f"Failed to acquire API key for thread: {e}", error_type="no_keys")
+
+ # Handle refactored mode with disabled internal retry
+ if getattr(self, '_disable_internal_retry', False):
+ t0 = time.time()
+
+ # For image requests, prepare messages with embedded image
+ if image_data:
+ messages = self._prepare_image_messages(messages, image_data)
+
+ # Validate request
+ valid, error_msg = self._validate_request(messages, max_tokens)
+ if not valid:
+ raise UnifiedClientError(f"Invalid request: {error_msg}", error_type="validation")
+
+ # File names and payload save
+ payload_name, response_name = self._get_file_names(messages, context=self.context or context)
+ self._save_payload(messages, payload_name, retry_reason=retry_reason)
+
+ # Get response via provider router
+ response = self._get_response(messages, temperature, max_tokens, max_completion_tokens, response_name)
+
+ # Extract text uniformly
+ extracted_content, finish_reason = self._extract_response_text(response, provider=getattr(self, 'client_type', 'unknown'))
+
+ # Save response if any
+ if extracted_content:
+ self._save_response(extracted_content, response_name)
+
+ # Stats and success mark
+ self._track_stats(context, True, None, time.time() - t0)
+ self._mark_key_success()
+
+ # API delay between calls (respects GUI setting)
+ self._apply_api_delay()
+
+ return extracted_content, finish_reason
+
+ # Main implementation with retry logic
+ start_time = time.time()
+
+ # Generate request hash WITH request ID if provided
+ if image_data:
+ image_size = len(image_data) if isinstance(image_data, (bytes, str)) else 0
+ if request_id:
+ messages_hash = self._get_request_hash_with_request_id(messages, request_id)
+ else:
+ request_id = str(uuid.uuid4())[:8]
+ messages_hash = self._get_request_hash_with_request_id(messages, request_id)
+ request_hash = f"{messages_hash}_img{image_size}"
+ else:
+ if request_id:
+ request_hash = self._get_request_hash_with_request_id(messages, request_id)
+ else:
+ request_id = str(uuid.uuid4())[:8]
+ request_hash = self._get_request_hash_with_request_id(messages, request_id)
+
+ thread_name = threading.current_thread().name
+
+ # Log with hash for tracking
+ logger.debug(f" Request ID: {request_id}")
+ logger.debug(f" Hash: {request_hash[:8]}...")
+ logger.debug(f" Retry reason: {retry_reason}")
+
+ # Log with hash for tracking
+ logger.debug(f"[{thread_name}] _send_internal starting for {context} (hash: {request_hash[:8]}...) retry_reason: {retry_reason}")
+
+ # Reset cancelled flag
+ self._cancelled = False
+
+ # Reset counters when context changes
+ if context != self.current_session_context:
+ self.reset_conversation_for_new_context(context)
+
+ self.context = context or 'translation'
+ self.conversation_message_count += 1
+
+ # Internal retry logic for 500 errors - now optionally disabled (centralized retry handles it)
+ internal_retries = self._get_max_retries()
+ base_delay = 5 # Base delay for exponential backoff
+
+ # Track if we've tried main key for prohibited content
+ main_key_attempted = False
+
+ # Initialize variables that might be referenced in exception handlers
+ extracted_content = ""
+ finish_reason = 'error'
+
+ # Track whether we already attempted a Gemma/OpenRouter system->user retry
+ gemma_no_system_retry_done = False
+
+ for attempt in range(internal_retries):
+ try:
+ # Validate request
+ valid, error_msg = self._validate_request(messages, max_tokens)
+ if not valid:
+ raise UnifiedClientError(f"Invalid request: {error_msg}", error_type="validation")
+
+ os.makedirs("Payloads", exist_ok=True)
+
+ # Apply reinforcement
+ messages = self._apply_pure_reinforcement(messages)
+
+ # For image requests, prepare messages with embedded image
+ if image_data:
+ messages = self._prepare_image_messages(messages, image_data)
+
+ # Get file names - now unique per request AND attempt
+ payload_name, response_name = self._get_file_names(messages, context=self.context)
+
+ # Add request ID and attempt to filename for complete isolation
+ payload_name, response_name = self._with_attempt_suffix(payload_name, response_name, request_id, attempt, is_image=bool(image_data))
+
+ # Save payload with retry reason
+ # On internal retries (500 errors), add that info too
+ if attempt > 0:
+ internal_retry_reason = f"500_error_attempt_{attempt}"
+ if retry_reason:
+ combined_reason = f"{retry_reason}_{internal_retry_reason}"
+ else:
+ combined_reason = internal_retry_reason
+ self._save_payload(messages, payload_name, retry_reason=combined_reason)
+ else:
+ self._save_payload(messages, payload_name, retry_reason=retry_reason)
+
+ # FIX: Define payload_messages BEFORE using it
+ # Create a sanitized version for payload (without actual image data)
+ payload_messages = [
+ {**msg, 'content': 'IMAGE_DATA_OMITTED' if isinstance(msg.get('content'), list) else msg.get('content')}
+ for msg in messages
+ ]
+
+ # Now save the payload (payload_messages is now defined)
+ #self._save_payload(payload_messages, payload_name)
+
+
+ # Set idempotency context for downstream calls
+ self._set_idempotency_context(request_id, attempt)
+
+ # Unified provider dispatch: for image requests, messages already embed the image.
+ # Route via the same _get_response used for text; Gemini handler internally detects images.
+ response = self._get_response(messages, temperature, max_tokens, max_completion_tokens, response_name)
+
+ # Check for cancellation (from timeout or stop button)
+ if self._cancelled:
+ if not self._is_stop_requested():
+ logger.info("Operation cancelled (timeout or user stop)")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+ # ====== UNIVERSAL EXTRACTION INTEGRATION ======
+ # Use universal extraction instead of assuming response.content exists
+ extracted_content = ""
+ finish_reason = 'stop'
+
+ if response:
+ # Prepare provider-specific parameters
+ extraction_kwargs = {}
+
+ # Add provider-specific parameters if applicable
+ extraction_kwargs.update(self._get_extraction_kwargs())
+
+# Try universal extraction with provider-specific parameters
+ extracted_content, finish_reason = self._extract_response_text(
+ response,
+ provider=getattr(self, 'client_type', 'unknown'),
+ **extraction_kwargs
+ )
+
+ # If extraction failed but we have a response object
+ if not extracted_content and response:
+ print(f"⚠️ Failed to extract text from {getattr(self, 'client_type', 'unknown')} response")
+ print(f" Response type: {type(response)}")
+
+ # Provider-specific guidance
+ if getattr(self, 'client_type', None) == 'gemini':
+ print(f" Consider checking Gemini response structure")
+ print(f" Response attributes: {dir(response)[:5]}...") # Show first 5 attributes
+ else:
+ print(f" Consider checking response extraction for this provider")
+
+ # Log the response structure for debugging
+ self._save_failed_request(messages, "Extraction failed", context, response)
+
+ # Check if response has any common attributes we missed
+ if hasattr(response, 'content') and response.content:
+ extracted_content = str(response.content)
+ print(f" Fallback: Using response.content directly")
+ elif hasattr(response, 'text') and response.text:
+ extracted_content = str(response.text)
+ print(f" Fallback: Using response.text directly")
+
+ # Update response object with extracted content
+ if extracted_content and hasattr(response, 'content'):
+ response.content = extracted_content
+ elif extracted_content:
+ # Create a new response object if needed
+ response = UnifiedResponse(
+ content=extracted_content,
+ finish_reason=finish_reason,
+ raw_response=response
+ )
+
+ # CRITICAL: Save response for duplicate detection
+ # This must happen even for truncated/empty responses
+ if extracted_content:
+ self._save_response(extracted_content, response_name)
+
+ # Handle empty responses
+ if not extracted_content or extracted_content.strip() in ["", "[]", "[IMAGE TRANSLATION FAILED]"]:
+ is_likely_safety_filter = self._detect_safety_filter(messages, extracted_content, finish_reason, response, getattr(self, 'client_type', 'unknown'))
+ if is_likely_safety_filter and not main_key_attempted and self._multi_key_mode and getattr(self, 'original_api_key', None) and getattr(self, 'original_model', None):
+ main_key_attempted = True
+ try:
+ retry_res = self._retry_with_main_key(messages, temperature, max_tokens, max_completion_tokens, context, request_id=request_id, image_data=image_data)
+ if retry_res:
+ content, fr = retry_res
+ if content and content.strip() and len(content) > 10:
+ return content, fr
+ except Exception:
+ pass
+ # Finalize empty handling
+ req_type = 'image' if image_data else 'text'
+ return self._finalize_empty_response(messages, context, response, extracted_content or "", finish_reason, getattr(self, 'client_type', 'unknown'), req_type, start_time)
+
+ # Track success
+ self._track_stats(context, True, None, time.time() - start_time)
+
+ # Mark key as successful in multi-key mode
+ self._mark_key_success()
+
+ # Check for truncation and handle retry if enabled
+ if finish_reason in ['length', 'max_tokens']:
+ print(f"Response was truncated: {finish_reason}")
+ print(f"⚠️ Response truncated (finish_reason: {finish_reason})")
+
+ # ALWAYS log truncation failures
+ self._log_truncation_failure(
+ messages=messages,
+ response_content=extracted_content,
+ finish_reason=finish_reason,
+ context=context,
+ error_details=getattr(response, 'error_details', None) if response else None
+ )
+
+ # Check if retry on truncation is enabled
+ retry_truncated_enabled = os.getenv("RETRY_TRUNCATED", "0") == "1"
+
+ if retry_truncated_enabled:
+ print(f" 🔄 RETRY_TRUNCATED enabled - attempting to retry with increased token limit")
+
+ # Get the max retry tokens limit
+ max_retry_tokens = int(os.getenv("MAX_RETRY_TOKENS", "16384"))
+ current_max_tokens = max_tokens or 8192
+
+ if current_max_tokens < max_retry_tokens:
+ new_max_tokens = min(current_max_tokens * 2, max_retry_tokens)
+ print(f" 📊 Retrying with increased tokens: {current_max_tokens} → {new_max_tokens}")
+
+ try:
+ # Recursive call with increased token limit
+ retry_content, retry_finish_reason = self._send_internal(
+ messages=messages,
+ temperature=temperature,
+ max_tokens=new_max_tokens,
+ max_completion_tokens=max_completion_tokens,
+ context=context,
+ retry_reason=f"truncation_retry_{finish_reason}",
+ request_id=request_id,
+ image_data=image_data
+ )
+
+ # Check if retry succeeded (not truncated)
+ if retry_finish_reason not in ['length', 'max_tokens']:
+ print(f" ✅ Truncation retry succeeded: {len(retry_content)} chars")
+ return retry_content, retry_finish_reason
+ else:
+ print(f" ⚠️ Retry was also truncated, returning original response")
+
+ except Exception as retry_error:
+ print(f" ❌ Truncation retry failed: {retry_error}")
+ else:
+ print(f" 📊 Already at max retry tokens ({current_max_tokens}), not retrying")
+ else:
+ print(f" 📋 RETRY_TRUNCATED disabled - accepting truncated response")
+
+ # Apply API delay after successful call (even if truncated)
+ # SKIP DELAY DURING CLEANUP
+
+ self._apply_api_delay()
+
+ # Brief stability pause after API call completion
+ if not getattr(self, '_in_cleanup', False):
+ time.sleep(0.1) # System stability pause after API completion
+
+ # If the provider signaled a content filter, elevate to prohibited_content to trigger retries
+ if finish_reason == 'content_filter':
+ raise UnifiedClientError(
+ "Content blocked by provider",
+ error_type="prohibited_content",
+ http_status=400
+ )
+
+ # Return the response with accurate finish_reason
+ # This is CRITICAL for retry mechanisms to work
+ return extracted_content, finish_reason
+
+ except UnifiedClientError as e:
+ # Handle cancellation specially for timeout support
+ if e.error_type == "cancelled" or "cancelled" in str(e):
+ self._in_cleanup = False # Ensure cleanup flag is set
+ if not self._is_stop_requested():
+ logger.info("Propagating cancellation to caller")
+ # Re-raise so send_with_interrupt can handle it
+ raise
+
+ print(f"UnifiedClient error: {e}")
+
+ # Check if it's a rate limit error - handle according to mode
+ error_str = str(e).lower()
+ if self._is_rate_limit_error(e):
+ # In multi-key mode, always re-raise to let _send_core handle key rotation
+ if self._multi_key_mode:
+ print(f"🔄 Rate limit error - multi-key mode active, re-raising for key rotation")
+ raise
+
+ # In single-key mode, check if indefinite retry is enabled
+ indefinite_retry_enabled = os.getenv("INDEFINITE_RATE_LIMIT_RETRY", "1") == "1"
+
+ if indefinite_retry_enabled:
+ # Calculate wait time from Retry-After header if available
+ retry_after_seconds = 60 # Default wait time
+ if hasattr(e, 'http_status') and e.http_status == 429:
+ # Try to extract Retry-After from the error if it contains header info
+ error_details = str(e)
+ if 'retry-after' in error_details.lower():
+ import re
+ match = re.search(r'retry-after[:\s]+([0-9]+)', error_details.lower())
+ if match:
+ retry_after_seconds = int(match.group(1))
+
+ # Add some jitter and cap the wait time
+ wait_time = min(retry_after_seconds + random.uniform(1, 10), 300) # Max 5 minutes
+
+ print(f"🔄 Rate limit error - single-key indefinite retry, waiting {wait_time:.1f}s (attempt {attempt + 1}/{internal_retries})")
+
+ # Wait with cancellation check
+ wait_start = time.time()
+ while time.time() - wait_start < wait_time:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+
+ # For rate limit errors, continue retrying without counting against max retries
+ # Reset attempt counter to avoid exhausting retries on rate limits
+ attempt = max(0, attempt - 1) # Don't count rate limit waits against retry budget
+ continue # Retry the attempt
+ else:
+ print(f"❌ Rate limit error - single-key mode, indefinite retry disabled, re-raising")
+ raise
+
+ # Check for prohibited content — treat any HTTP 400 as prohibited to force fallback
+ if (
+ e.error_type == "prohibited_content"
+ or getattr(e, 'http_status', None) == 400
+ or " 400 " in error_str
+ or self._detect_safety_filter(messages, extracted_content or "", finish_reason, None, getattr(self, 'client_type', 'unknown'))
+ ):
+ print(f"❌ Prohibited content detected: {error_str[:200]}")
+
+ # Different behavior based on mode
+ if self._multi_key_mode:
+ # Multi-key mode: Attempt main key retry once, then fall through to fallback
+ if not main_key_attempted:
+ main_key_attempted = True
+ retry_res = self._maybe_retry_main_key_on_prohibited(
+ messages, temperature, max_tokens, max_completion_tokens, context, request_id=request_id, image_data=image_data
+ )
+ if retry_res:
+ res_content, res_fr = retry_res
+ if res_content and res_content.strip():
+ return res_content, res_fr
+ else:
+ # Single-key mode: Check if fallback keys are enabled
+ use_fallback_keys = os.getenv('USE_FALLBACK_KEYS', '0') == '1'
+ if use_fallback_keys:
+ print(f"[FALLBACK DIRECT] Using fallback keys")
+ # Try fallback keys directly without retrying main key
+ retry_res = self._try_fallback_keys_direct(
+ messages, temperature, max_tokens, max_completion_tokens, context, request_id=request_id, image_data=image_data
+ )
+ if retry_res:
+ res_content, res_fr = retry_res
+ if res_content and res_content.strip():
+ return res_content, res_fr
+ else:
+ print(f"[SINGLE-KEY MODE] Fallback keys disabled - no retry available")
+
+ # Fallthrough: record and return generic fallback
+ self._save_failed_request(messages, e, context)
+ self._track_stats(context, False, type(e).__name__, time.time() - start_time)
+ fallback_content = self._handle_empty_result(messages, context, str(e))
+ return fallback_content, 'error'
+
+ # Check for retryable server errors (500, 502, 503, 504)
+ http_status = getattr(e, 'http_status', None)
+ retryable_errors = ["500", "502", "503", "504", "api_error", "internal server error", "bad gateway", "service unavailable", "gateway timeout"]
+
+ if (http_status in [500, 502, 503, 504] or
+ any(err in error_str for err in retryable_errors)):
+ if attempt < internal_retries - 1:
+ # In multi-key mode, try rotating keys before backing off
+ if self._multi_key_mode and attempt > 0: # Only after first attempt
+ try:
+ print(f"🔄 Server error ({http_status or 'API error'}) - attempting key rotation (multi-key mode)")
+ self._handle_rate_limit_for_thread()
+ print(f"🔄 Server error retry: Key rotation completed, retrying immediately...")
+ time.sleep(1) # Brief pause after key rotation
+ continue # Retry with new key immediately
+ except Exception as rotation_error:
+ print(f"❌ Key rotation failed during server error: {rotation_error}")
+ # Fall back to normal exponential backoff
+
+ # Exponential backoff with jitter
+ delay = self._compute_backoff(attempt, base_delay, 60) # Max 60 seconds
+
+ print(f"🔄 Server error ({http_status or 'API error'}) - auto-retrying in {delay:.1f}s (attempt {attempt + 1}/{internal_retries})")
+
+ # Wait with cancellation check
+ wait_start = time.time()
+ while time.time() - wait_start < delay:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5) # Check every 0.5 seconds
+ print(f"🔄 Server error retry: Backoff completed, initiating retry attempt...")
+ time.sleep(1) # Brief pause after backoff for retry stability
+ continue # Retry the attempt
+ else:
+ print(f"❌ Server error ({http_status or 'API error'}) - exhausted {internal_retries} retries")
+
+ # Check for other retryable errors (timeouts, connection issues)
+ timeout_errors = ["timeout", "timed out", "connection reset", "connection aborted", "connection error", "network error"]
+ if any(err in error_str for err in timeout_errors):
+ if attempt < internal_retries - 1:
+ delay = self._compute_backoff(attempt, base_delay/2, 30) # Shorter delay for timeouts
+
+ print(f"🔄 Network/timeout error - retrying in {delay:.1f}s (attempt {attempt + 1}/{internal_retries})")
+
+ wait_start = time.time()
+ while time.time() - wait_start < delay:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+ print(f"🔄 Timeout error retry: Backoff completed, initiating retry attempt...")
+ time.sleep(0.1) # Brief pause after backoff for retry stability
+ continue # Retry the attempt
+ else:
+ print(f"❌ Network/timeout error - exhausted {internal_retries} retries")
+
+ # If we get here, this is the last attempt or a non-retryable error
+ # Save failed request and return fallback only if we've exhausted retries
+ if attempt >= internal_retries - 1:
+ print(f"❌ Final attempt failed, returning fallback response")
+ self._save_failed_request(messages, e, context)
+ self._track_stats(context, False, type(e).__name__, time.time() - start_time)
+ fallback_content = self._handle_empty_result(messages, context, str(e))
+ return fallback_content, 'error'
+ else:
+ # For other errors, try again with a short delay
+ delay = self._compute_backoff(attempt, base_delay/4, 15) # Short delay for other errors
+ print(f"🔄 API error - retrying in {delay:.1f}s (attempt {attempt + 1}/{internal_retries}): {str(e)[:100]}")
+
+ wait_start = time.time()
+ while time.time() - wait_start < delay:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+ print(f"🔄 API error retry: Backoff completed, initiating retry attempt...")
+ time.sleep(0.1) # Brief pause after backoff for retry stability
+ continue # Retry the attempt
+
+ except Exception as e:
+ # COMPREHENSIVE ERROR HANDLING FOR NoneType and other issues
+ error_str = str(e).lower()
+ print(f"Unexpected error: {e}")
+ # Save unexpected error details to Payloads/failed_requests
+ try:
+ self._save_failed_request(messages, e, context)
+ except Exception:
+ pass
+
+ # Special handling for NoneType length errors
+ if "nonetype" in error_str and "len" in error_str:
+ print(f"🚨 Detected NoneType length error - likely caused by None message content")
+ print(f"🔍 Error details: {type(e).__name__}: {e}")
+ print(f"🔍 Context: {context}, Messages count: {self._safe_len(messages, 'unexpected_error_messages')}")
+
+ # Log the actual traceback for debugging
+ import traceback
+ print(f"🔍 Traceback: {traceback.format_exc()}")
+
+ # Return a safe fallback
+ self._save_failed_request(messages, e, context)
+ self._track_stats(context, False, "nonetype_length_error", time.time() - start_time)
+ fallback_content = self._handle_empty_result(messages, context, "NoneType length error")
+ return fallback_content, 'error'
+
+ # For unexpected errors, check if it's a timeout
+ if "timed out" in error_str:
+ # Re-raise timeout errors so the retry logic can handle them
+ raise UnifiedClientError(f"Request timed out: {e}", error_type="timeout")
+
+ # Check if it's a rate limit error - handle according to mode
+ if self._is_rate_limit_error(e):
+ # In multi-key mode, always re-raise to let _send_core handle key rotation
+ if self._multi_key_mode:
+ print(f"🔄 Unexpected rate limit error - multi-key mode active, re-raising for key rotation")
+ raise
+
+ # In single-key mode, check if indefinite retry is enabled
+ indefinite_retry_enabled = os.getenv("INDEFINITE_RATE_LIMIT_RETRY", "1") == "1"
+
+ if indefinite_retry_enabled:
+ # Calculate wait time from Retry-After header if available
+ retry_after_seconds = 60 # Default wait time
+ if hasattr(e, 'http_status') and e.http_status == 429:
+ # Try to extract Retry-After from the error if it contains header info
+ error_details = str(e)
+ if 'retry-after' in error_details.lower():
+ import re
+ match = re.search(r'retry-after[:\s]+([0-9]+)', error_details.lower())
+ if match:
+ retry_after_seconds = int(match.group(1))
+
+ # Add some jitter and cap the wait time
+ wait_time = min(retry_after_seconds + random.uniform(1, 10), 300) # Max 5 minutes
+
+ print(f"🔄 Unexpected rate limit error - single-key indefinite retry, waiting {wait_time:.1f}s (attempt {attempt + 1}/{internal_retries})")
+
+ # Wait with cancellation check
+ wait_start = time.time()
+ while time.time() - wait_start < wait_time:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+
+ # For rate limit errors, continue retrying without counting against max retries
+ # Reset attempt counter to avoid exhausting retries on rate limits
+ attempt = max(0, attempt - 1) # Don't count rate limit waits against retry budget
+ continue # Retry the attempt
+ else:
+ print(f"❌ Unexpected rate limit error - single-key mode, indefinite retry disabled, re-raising")
+ raise # Re-raise for higher-level handling
+
+ # Check for prohibited content in unexpected errors
+ if self._detect_safety_filter(messages, extracted_content or "", finish_reason, None, getattr(self, 'client_type', 'unknown')):
+ print(f"❌ Content prohibited in unexpected error: {error_str[:200]}")
+
+ # If we're in multi-key mode and haven't tried the main key yet
+ if (self._multi_key_mode and not main_key_attempted and getattr(self, 'original_api_key', None) and getattr(self, 'original_model', None)):
+ main_key_attempted = True
+ try:
+ retry_res = self._retry_with_main_key(messages, temperature, max_tokens, max_completion_tokens, context)
+ if retry_res:
+ content, fr = retry_res
+ return content, fr
+ except Exception:
+ pass
+
+ # Fall through to normal error handling
+ print(f"❌ Content prohibited - not retrying")
+ self._save_failed_request(messages, e, context)
+ self._track_stats(context, False, "unexpected_error", time.time() - start_time)
+ fallback_content = self._handle_empty_result(messages, context, str(e))
+ return fallback_content, 'error'
+
+ # Check for retryable server errors
+ retryable_server_errors = ["500", "502", "503", "504", "internal server error", "bad gateway", "service unavailable", "gateway timeout"]
+ if any(err in error_str for err in retryable_server_errors):
+ if attempt < internal_retries - 1:
+ # In multi-key mode, try rotating keys before backing off
+ if self._multi_key_mode and attempt > 0: # Only after first attempt
+ try:
+ print(f"🔄 Unexpected server error - attempting key rotation (multi-key mode)")
+ self._handle_rate_limit_for_thread()
+ print(f"🔄 Unexpected server error retry: Key rotation completed, retrying immediately...")
+ time.sleep(0.1) # Brief pause after key rotation
+ continue # Retry with new key immediately
+ except Exception as rotation_error:
+ print(f"❌ Key rotation failed during unexpected server error: {rotation_error}")
+ # Fall back to normal exponential backoff
+
+ # Exponential backoff with jitter
+ delay = self._compute_backoff(attempt, base_delay, 60) # Max 60 seconds
+
+ print(f"🔄 Server error - auto-retrying in {delay:.1f}s (attempt {attempt + 1}/{internal_retries})")
+
+ wait_start = time.time()
+ while time.time() - wait_start < delay:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+ continue # Retry the attempt
+
+ # Check for other transient errors with exponential backoff
+ transient_errors = ["connection reset", "connection aborted", "connection error", "network error", "timeout", "timed out"]
+ if any(err in error_str for err in transient_errors):
+ if attempt < internal_retries - 1:
+ # In multi-key mode, try rotating keys for network issues
+ if self._multi_key_mode and attempt > 0: # Only after first attempt
+ try:
+ print(f"🔄 Transient error - attempting key rotation (multi-key mode)")
+ self._handle_rate_limit_for_thread()
+ print(f"🔄 Transient error retry: Key rotation completed, retrying immediately...")
+ time.sleep(0.1) # Brief pause after key rotation
+ continue # Retry with new key immediately
+ except Exception as rotation_error:
+ print(f"❌ Key rotation failed during transient error: {rotation_error}")
+ # Fall back to normal exponential backoff
+
+ # Use a slightly less aggressive backoff for transient errors
+ delay = self._compute_backoff(attempt, base_delay/2, 30) # Max 30 seconds
+
+ print(f"🔄 Transient error - retrying in {delay:.1f}s (attempt {attempt + 1}/{internal_retries})")
+
+ wait_start = time.time()
+ while time.time() - wait_start < delay:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+ continue # Retry the attempt
+
+ # If we get here, either we've exhausted retries or it's a non-retryable error
+ if attempt >= internal_retries - 1:
+ print(f"❌ Unexpected error - final attempt failed, returning fallback")
+ self._save_failed_request(messages, e, context)
+ self._track_stats(context, False, "unexpected_error", time.time() - start_time)
+ fallback_content = self._handle_empty_result(messages, context, str(e))
+ return fallback_content, 'error'
+ else:
+ # In multi-key mode, try rotating keys before short backoff
+ if self._multi_key_mode and attempt > 0: # Only after first attempt
+ try:
+ print(f"🔄 Other error - attempting key rotation (multi-key mode)")
+ self._handle_rate_limit_for_thread()
+ print(f"🔄 Other error retry: Key rotation completed, retrying immediately...")
+ time.sleep(0.1) # Brief pause after key rotation
+ continue # Retry with new key immediately
+ except Exception as rotation_error:
+ print(f"❌ Key rotation failed during other error: {rotation_error}")
+ # Fall back to normal exponential backoff
+
+ # For other unexpected errors, try again with a short delay
+ delay = self._compute_backoff(attempt, base_delay/4, 15) # Short delay
+ print(f"🔄 Unexpected error - retrying in {delay:.1f}s (attempt {attempt + 1}/{internal_retries}): {str(e)[:100]}")
+
+ wait_start = time.time()
+ while time.time() - wait_start < delay:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+ time.sleep(0.5)
+ continue # Retry the attempt
+
+
+ def _retry_with_main_key(self, messages, temperature=None, max_tokens=None,
+ max_completion_tokens=None, context=None,
+ request_id=None, image_data=None) -> Optional[Tuple[str, Optional[str]]]:
+ """
+ Unified retry method for both text and image requests with main/fallback keys.
+ Pass image_data=None for text requests, or image bytes/base64 for image requests.
+ Returns None when fallbacks are disabled.
+ """
+ # Determine if this is an image request
+ is_image_request = image_data is not None
+
+ # THREAD-SAFE RECURSION CHECK: Use thread-local storage
+ tls = self._get_thread_local_client()
+
+ # Check if THIS THREAD is already in a retry (unified check for both text and image)
+ retry_flag = 'in_image_retry' if image_data else 'in_retry'
+ if getattr(tls, retry_flag, False):
+ retry_type = "IMAGE " if image_data else ""
+ print(f"[{retry_type}MAIN KEY RETRY] Thread {threading.current_thread().name} already in retry, preventing recursion")
+ return None
+
+ # CHECK: Verify multi-key mode is actually enabled
+ if not self._multi_key_mode:
+ print(f"[MAIN KEY RETRY] Not in multi-key mode, skipping retry")
+ return None
+
+ # CHECK: Multi-key mode is already verified above via self._multi_key_mode
+ # DO NOT gate main-GUI-key retry on fallback toggle; only use toggle for additional fallback keys
+ use_fallback_keys = os.getenv('USE_FALLBACK_KEYS', '0') == '1'
+
+ # CHECK: Verify we have the necessary attributes
+ if not (hasattr(self, 'original_api_key') and
+ hasattr(self, 'original_model') and
+ self.original_api_key and
+ self.original_model):
+ print(f"[MAIN KEY RETRY] Missing original key/model attributes, skipping retry")
+ return None
+
+ # Mark THIS THREAD as being in retry
+ setattr(tls, retry_flag, True)
+
+ try:
+ fallback_keys = []
+
+ # FIRST: Always add the MAIN GUI KEY as the first fallback
+ fallback_keys.append({
+ 'api_key': self.original_api_key,
+ 'model': self.original_model,
+ 'label': 'MAIN GUI KEY'
+ })
+ print(f"[MAIN KEY RETRY] Using main GUI key with model: {self.original_model}")
+
+ # Add configured fallback keys only if toggle is enabled
+ fallback_keys_json = os.getenv('FALLBACK_KEYS', '[]')
+
+ if use_fallback_keys and fallback_keys_json != '[]':
+ try:
+ configured_fallbacks = json.loads(fallback_keys_json)
+ print(f"[DEBUG] Loaded {len(configured_fallbacks)} fallback keys from environment")
+ for fb in configured_fallbacks:
+ fallback_keys.append({
+ 'api_key': fb.get('api_key'),
+ 'model': fb.get('model'),
+ 'google_credentials': fb.get('google_credentials'),
+ 'azure_endpoint': fb.get('azure_endpoint'),
+ 'google_region': fb.get('google_region'),
+ 'label': 'FALLBACK KEY'
+ })
+ except Exception as e:
+ print(f"[DEBUG] Failed to parse FALLBACK_KEYS: {e}")
+ elif not use_fallback_keys:
+ print("[MAIN KEY RETRY] Fallback keys toggle is OFF — will try main GUI key only")
+
+ print(f"[MAIN KEY RETRY] Total keys to try: {len(fallback_keys)}")
+
+ # Try each fallback key in the list
+ max_attempts = min(len(fallback_keys), self._get_max_retries())
+ for idx, fallback_data in enumerate(fallback_keys[:max_attempts]):
+ label = fallback_data.get('label', 'Fallback')
+ fallback_key = fallback_data.get('api_key')
+ fallback_model = fallback_data.get('model')
+ fallback_google_creds = fallback_data.get('google_credentials')
+ fallback_azure_endpoint = fallback_data.get('azure_endpoint')
+ fallback_google_region = fallback_data.get('google_region')
+
+ print(f"[{label} {idx+1}/{max_attempts}] Trying {fallback_model}")
+ print(f"[{label} {idx+1}] Failed multi-key model was: {self.model}")
+
+ try:
+ # Create a new temporary UnifiedClient instance with the fallback key
+ temp_client = UnifiedClient(
+ api_key=fallback_key,
+ model=fallback_model,
+ output_dir=self.output_dir
+ )
+
+ # Set key-specific credentials for the temp client
+ if fallback_google_creds:
+ temp_client.current_key_google_creds = fallback_google_creds
+ temp_client.google_creds_path = fallback_google_creds
+ print(f"[{label} {idx+1}] Using fallback Google credentials: {os.path.basename(fallback_google_creds)}")
+
+ if fallback_google_region:
+ temp_client.current_key_google_region = fallback_google_region
+ print(f"[{label} {idx+1}] Using fallback Google region: {fallback_google_region}")
+
+ if fallback_azure_endpoint:
+ temp_client.current_key_azure_endpoint = fallback_azure_endpoint
+ # Set up Azure-specific configuration
+ temp_client.is_azure = True
+ temp_client.azure_endpoint = fallback_azure_endpoint
+ temp_client.azure_api_version = os.getenv('AZURE_API_VERSION', '2024-08-01-preview')
+ print(f"[{label} {idx+1}] Using fallback Azure endpoint: {fallback_azure_endpoint}")
+ print(f"[{label} {idx+1}] Azure API version: {temp_client.azure_api_version}")
+
+ # Don't override with main client's base_url if we have fallback Azure endpoint
+ if hasattr(self, 'base_url') and self.base_url and not fallback_azure_endpoint:
+ temp_client.base_url = self.base_url
+ temp_client.openai_base_url = self.base_url
+
+ if hasattr(self, 'api_version') and not fallback_azure_endpoint:
+ temp_client.api_version = self.api_version
+
+ # Only inherit Azure settings if fallback doesn't have its own Azure endpoint
+ if hasattr(self, 'is_azure') and self.is_azure and not fallback_azure_endpoint:
+ temp_client.is_azure = self.is_azure
+ temp_client.azure_endpoint = getattr(self, 'azure_endpoint', None)
+ temp_client.azure_api_version = getattr(self, 'azure_api_version', '2024-08-01-preview')
+
+ # Force the client to reinitialize with Azure settings
+ temp_client._setup_client()
+
+ # FORCE single-key mode after initialization
+ temp_client._multi_key_mode = False
+ temp_client.use_multi_keys = False
+ temp_client.key_identifier = f"{label} ({fallback_model})"
+ temp_client._is_retry_client = True
+
+ # The client should already be set up from __init__, but verify
+ if not hasattr(temp_client, 'client_type') or temp_client.client_type is None:
+ temp_client.api_key = fallback_key
+ temp_client.model = fallback_model
+ temp_client._setup_client()
+
+ # Copy relevant state BUT NOT THE CANCELLATION FLAG
+ temp_client.context = context
+ temp_client._cancelled = False
+ temp_client._in_cleanup = False
+ temp_client.current_session_context = self.current_session_context
+ temp_client.conversation_message_count = self.conversation_message_count
+ temp_client.request_timeout = self.request_timeout
+
+ print(f"[{label} {idx+1}] Created temp client with model: {temp_client.model}")
+ print(f"[{label} {idx+1}] Multi-key mode: {temp_client._multi_key_mode}")
+
+ # Get file names for response tracking
+ payload_name, response_name = self._get_file_names(messages, context=context)
+
+ request_type = "image " if image_data else ""
+ print(f"[{label} {idx+1}] Sending {request_type}request...")
+
+ # Use unified internal method to avoid nested retry loops
+ result = temp_client._send_internal(
+ messages=messages,
+ temperature=temperature,
+ max_tokens=max_tokens,
+ max_completion_tokens=max_completion_tokens,
+ context=context,
+ retry_reason=f"{request_type.replace(' ', '')}{label.lower().replace(' ', '_')}_{idx+1}",
+ request_id=request_id,
+ image_data=image_data
+ )
+
+ # Check the result
+ if result and isinstance(result, tuple):
+ content, finish_reason = result
+
+ # Check if content is an error message
+ if content and "[AI RESPONSE UNAVAILABLE]" in content:
+ print(f"[{label} {idx+1}] ❌ Got error message: {content}")
+ continue
+
+ # Check if content is valid - FIX: Add None check
+ if content and self._safe_len(content, "main_key_retry_content") > 50:
+ print(f"[{label} {idx+1}] ✅ SUCCESS! Got content of length: {len(content)}")
+ self._save_response(content, response_name)
+ return content, finish_reason
+ else:
+ print(f"[{label} {idx+1}] ❌ Content too short or empty: {len(content) if content else 0} chars")
+ continue
+ else:
+ print(f"[{label} {idx+1}] ❌ Unexpected result type: {type(result)}")
+ continue
+
+ except UnifiedClientError as e:
+ if e.error_type == "cancelled":
+ print(f"[{label} {idx+1}] Operation was cancelled during retry")
+ return None
+
+ error_str = str(e).lower()
+ if ("azure" in error_str and "content" in error_str) or e.error_type == "prohibited_content":
+ print(f"[{label} {idx+1}] ❌ Content filter error: {str(e)[:100]}")
+ continue
+
+ print(f"[{label} {idx+1}] ❌ UnifiedClientError: {str(e)[:200]}")
+ continue
+
+ except Exception as e:
+ print(f"[{label} {idx+1}] ❌ Exception: {str(e)[:200]}")
+ continue
+
+ print(f"[MAIN KEY RETRY] ❌ All {max_attempts} fallback keys failed")
+ return None
+
+ finally:
+ # ALWAYS clear the thread-local flag
+ setattr(tls, retry_flag, False)
+
+ def _try_fallback_keys_direct(self, messages, temperature=None, max_tokens=None,
+ max_completion_tokens=None, context=None,
+ request_id=None, image_data=None) -> Optional[Tuple[str, Optional[str]]]:
+ """
+ Try fallback keys directly in single-key mode without retrying main key.
+ Used when fallback keys are enabled in single-key mode.
+ """
+ # Check if fallback keys are enabled
+ use_fallback_keys = os.getenv('USE_FALLBACK_KEYS', '0') == '1'
+ if not use_fallback_keys:
+ print(f"[FALLBACK DIRECT] Fallback keys not enabled, skipping")
+ return None
+
+ # Load fallback keys from environment
+ fallback_keys_json = os.getenv('FALLBACK_KEYS', '[]')
+ if fallback_keys_json == '[]':
+ print(f"[FALLBACK DIRECT] No fallback keys configured")
+ return None
+
+ try:
+ configured_fallbacks = json.loads(fallback_keys_json)
+ print(f"[FALLBACK DIRECT] Loaded {len(configured_fallbacks)} fallback keys")
+
+ # Try each fallback key
+ max_attempts = min(len(configured_fallbacks), 3) # Limit attempts
+ for idx, fb in enumerate(configured_fallbacks[:max_attempts]):
+ fallback_key = fb.get('api_key')
+ fallback_model = fb.get('model')
+ fallback_google_creds = fb.get('google_credentials')
+ fallback_azure_endpoint = fb.get('azure_endpoint')
+ fallback_google_region = fb.get('google_region')
+ fallback_azure_api_version = fb.get('azure_api_version')
+
+ if not fallback_key or not fallback_model:
+ print(f"[FALLBACK DIRECT {idx+1}] Invalid key data, skipping")
+ continue
+
+ print(f"[FALLBACK DIRECT {idx+1}/{max_attempts}] Trying {fallback_model}")
+
+ try:
+ # Create temporary client for fallback key
+ temp_client = UnifiedClient(
+ api_key=fallback_key,
+ model=fallback_model,
+ output_dir=self.output_dir
+ )
+
+ # Set key-specific credentials
+ if fallback_google_creds:
+ temp_client.current_key_google_creds = fallback_google_creds
+ temp_client.google_creds_path = fallback_google_creds
+ print(f"[FALLBACK DIRECT {idx+1}] Using Google credentials: {os.path.basename(fallback_google_creds)}")
+
+ if fallback_google_region:
+ temp_client.current_key_google_region = fallback_google_region
+ print(f"[FALLBACK DIRECT {idx+1}] Using Google region: {fallback_google_region}")
+
+ if fallback_azure_endpoint:
+ temp_client.current_key_azure_endpoint = fallback_azure_endpoint
+ # Set up Azure-specific configuration
+ temp_client.is_azure = True
+ temp_client.azure_endpoint = fallback_azure_endpoint
+ # Use per-key Azure API version, fallback to environment, then default
+ temp_client.azure_api_version = fallback_azure_api_version or os.getenv('AZURE_API_VERSION', '2025-01-01-preview')
+ print(f"[FALLBACK DIRECT {idx+1}] Using Azure endpoint: {fallback_azure_endpoint}")
+ print(f"[FALLBACK DIRECT {idx+1}] Azure API version: {temp_client.azure_api_version}")
+
+ # Force single-key mode
+ temp_client._multi_key_mode = False
+ temp_client.key_identifier = f"FALLBACK KEY ({fallback_model})"
+ temp_client._is_retry_client = True
+
+ # Setup the client
+ temp_client._setup_client()
+
+ # Copy relevant state
+ temp_client.context = context
+ temp_client._cancelled = False
+ temp_client._in_cleanup = False
+ temp_client.current_session_context = self.current_session_context
+ temp_client.conversation_message_count = self.conversation_message_count
+ temp_client.request_timeout = self.request_timeout
+
+ print(f"[FALLBACK DIRECT {idx+1}] Sending request...")
+
+ # Use internal method to avoid nested retry loops
+ result = temp_client._send_internal(
+ messages=messages,
+ temperature=temperature,
+ max_tokens=max_tokens,
+ max_completion_tokens=max_completion_tokens,
+ context=context,
+ retry_reason=f"single_key_fallback_{idx+1}",
+ request_id=request_id,
+ image_data=image_data
+ )
+
+ # Check the result
+ if result and isinstance(result, tuple):
+ content, finish_reason = result
+
+ # Check if content is valid
+ if content and "[AI RESPONSE UNAVAILABLE]" not in content and len(content) > 50:
+ print(f"[FALLBACK DIRECT {idx+1}] ✅ SUCCESS! Got content of length: {len(content)}")
+ return content, finish_reason
+ else:
+ print(f"[FALLBACK DIRECT {idx+1}] ❌ Content too short or error: {len(content) if content else 0} chars")
+ continue
+ else:
+ print(f"[FALLBACK DIRECT {idx+1}] ❌ Unexpected result type: {type(result)}")
+ continue
+
+ except Exception as e:
+ print(f"[FALLBACK DIRECT {idx+1}] ❌ Failed: {e}")
+ continue
+
+ print(f"[FALLBACK DIRECT] All fallback keys failed")
+ return None
+
+ except Exception as e:
+ print(f"[FALLBACK DIRECT] Failed to parse fallback keys: {e}")
+ return None
+
+ # Image handling methods
+ def send_image(self, messages: List[Dict[str, Any]], image_data: Any,
+ temperature: Optional[float] = None,
+ max_tokens: Optional[int] = None,
+ max_completion_tokens: Optional[int] = None,
+ context: str = 'image_translation') -> Tuple[str, str]:
+ """Backwards-compatible public API; now delegates to unified _send_core."""
+ return self._send_core(messages, temperature, max_tokens, max_completion_tokens, context, image_data=image_data)
+
+ def _send_image_internal(self, messages: List[Dict[str, Any]], image_data: Any,
+ temperature: Optional[float] = None,
+ max_tokens: Optional[int] = None,
+ max_completion_tokens: Optional[int] = None,
+ context: str = 'image_translation',
+ retry_reason: Optional[str] = None,
+ request_id=None) -> Tuple[str, str]:
+ """
+ Image send internal - backwards compatibility wrapper
+ """
+ return self._send_internal(
+ messages, temperature, max_tokens, max_completion_tokens,
+ context or 'image_translation', retry_reason, request_id, image_data=image_data
+ )
+
+ def _prepare_image_messages(self, messages: List[Dict[str, Any]], image_data: Any) -> List[Dict[str, Any]]:
+ """
+ Helper method to prepare messages with embedded image for providers that accept image_url parts
+ """
+ embedded_messages = []
+ # Prepare base64 string
+ try:
+ if isinstance(image_data, (bytes, bytearray)):
+ b64 = base64.b64encode(image_data).decode('ascii')
+ else:
+ b64 = str(image_data)
+ except Exception:
+ b64 = str(image_data)
+
+ image_part = {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}}
+
+ for msg in messages:
+ if msg.get('role') == 'user':
+ content = msg.get('content', '')
+ if isinstance(content, list):
+ new_parts = list(content)
+ new_parts.append(image_part)
+ embedded_messages.append({"role": "user", "content": new_parts})
+ else:
+ embedded_messages.append({
+ "role": "user",
+ "content": [
+ {"type": "text", "text": content},
+ image_part
+ ]
+ })
+ else:
+ embedded_messages.append(msg)
+
+ if not any(m.get('role') == 'user' for m in embedded_messages):
+ embedded_messages.append({"role": "user", "content": [image_part]})
+
+ return embedded_messages
+
+ def _retry_image_with_main_key(self, messages, image_data, temperature=None, max_tokens=None,
+ max_completion_tokens=None, context=None, request_id=None) -> Optional[Tuple[str, Optional[str]]]:
+ """
+ Image retry method - backwards compatibility wrapper
+ """
+ return self._retry_with_main_key(
+ messages, temperature, max_tokens, max_completion_tokens,
+ context or 'image_translation', request_id, image_data=image_data
+ )
+
+ def reset_conversation_for_new_context(self, new_context):
+ """Reset conversation state when context changes"""
+ with self._model_lock:
+ self.current_session_context = new_context
+ self.conversation_message_count = 0
+ self.pattern_counts.clear()
+ self.last_pattern = None
+
+ logger.info(f"Reset conversation state for new context: {new_context}")
+
+ def _apply_pure_reinforcement(self, messages):
+ """Apply PURE frequency-based reinforcement pattern"""
+
+ # DISABLE in batch mode
+ #if os.getenv('BATCH_TRANSLATION', '0') == '1':
+ # return messages
+
+ # Skip if not enough messages
+ if self.conversation_message_count < 4:
+ return messages
+
+ # Create pattern from last 2 user messages
+ if len(messages) >= 2:
+ pattern = []
+ for msg in messages[-2:]:
+ if msg.get('role') == 'user':
+ content = msg['content']
+ pattern.append(len(content))
+
+ if len(pattern) >= 2:
+ pattern_key = f"reinforcement_{pattern[0]}_{pattern[1]}"
+
+ # MICROSECOND LOCK: When modifying pattern_counts
+ with self._model_lock:
+ self.pattern_counts[pattern_key] = self.pattern_counts.get(pattern_key, 0) + 1
+ count = self.pattern_counts[pattern_key]
+
+ # Just track patterns, NO PROMPT INJECTION
+ if count >= 3:
+ logger.info(f"Pattern {pattern_key} detected (count: {count})")
+ # NO [PATTERN REINFORCEMENT ACTIVE] - KEEP IT GONE
+
+ return messages
+
+ def _validate_and_clean_messages(self, messages):
+ """Validate and clean messages, removing None entries and fixing content issues"""
+ if messages is None:
+ return []
+ cleaned_messages = []
+ for msg in messages:
+ if msg is None:
+ continue
+ if not isinstance(msg, dict):
+ continue
+ # Ensure role exists and is a string
+ if 'role' not in msg or msg['role'] is None:
+ msg = dict(msg)
+ msg['role'] = 'user'
+ # Normalize content
+ if msg.get('content') is None:
+ msg = dict(msg) # Make a copy
+ msg['content'] = ''
+ cleaned_messages.append(msg)
+ return cleaned_messages
+ def _merge_system_into_user(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+ """Convert all system prompts into a user message by prepending them to the first
+ user message, separated by a line break. If no user message exists, one is created.
+ Supports both simple string content and OpenAI 'content parts' lists.
+ """
+ if not messages:
+ return []
+ system_texts: List[str] = []
+ # Collect system texts and build the new list without system messages
+ pruned: List[Dict[str, Any]] = []
+ for msg in messages:
+ role = msg.get("role")
+ if role == "system":
+ content = msg.get("content", "")
+ if isinstance(content, str):
+ if content.strip():
+ system_texts.append(content.strip())
+ elif isinstance(content, list):
+ for part in content:
+ if isinstance(part, dict) and part.get("type") == "text":
+ txt = part.get("text", "").strip()
+ if txt:
+ system_texts.append(txt)
+ # Skip adding this system message
+ continue
+ pruned.append(msg)
+ # Nothing to merge: still ensure we don't return an empty list
+ if not system_texts:
+ if not pruned:
+ return [{"role": "user", "content": ""}] # minimal valid user message to avoid empty list
+ return pruned
+ merged_header = "\n\n".join(system_texts).strip()
+ if merged_header:
+ merged_header += "\n" # ensure separation from current user content
+ # Find first user message and prepend
+ first_user_index = -1
+ for i, m in enumerate(pruned):
+ if m.get("role") == "user":
+ first_user_index = i
+ break
+ if first_user_index >= 0:
+ um = pruned[first_user_index]
+ content = um.get("content", "")
+ if isinstance(content, str):
+ um["content"] = f"{merged_header}{content}" if merged_header else content
+ elif isinstance(content, list):
+ # If first part is text, prepend; otherwise insert a text part at the front
+ if content and isinstance(content[0], dict) and content[0].get("type") == "text":
+ content[0]["text"] = f"{merged_header}{content[0].get('text', '')}" if merged_header else content[0].get('text', '')
+ else:
+ text_part = {"type": "text", "text": merged_header or ""}
+ content.insert(0, text_part)
+ um["content"] = content
+ else:
+ # Unknown structure; coerce to string with the merged header
+ um["content"] = f"{merged_header}{str(content)}"
+ pruned[first_user_index] = um
+ else:
+ # No user message exists; create one with the merged header
+ pruned.append({"role": "user", "content": merged_header})
+ return pruned
+
+ def _validate_request(self, messages, max_tokens=None):
+ """Validate request parameters before sending"""
+ # Clean messages first
+ messages = self._validate_and_clean_messages(messages)
+
+ if not messages:
+ return False, "Empty messages list"
+
+ # Check message content isn't empty - FIX: Add None checks
+ total_chars = 0
+ for msg in messages:
+ if msg is not None and msg.get('role') == 'user':
+ content = msg.get('content', '')
+ if content is not None:
+ total_chars += len(str(content))
+ if total_chars == 0:
+ return False, "Empty request content"
+
+ # Handle None max_tokens
+ if max_tokens is None:
+ max_tokens = getattr(self, 'max_tokens', 8192) # Use instance default or 8192
+
+ # Estimate tokens (rough approximation)
+ estimated_tokens = total_chars / 4
+ if estimated_tokens > max_tokens * 2:
+ print(f"Request might be too long: ~{estimated_tokens} tokens vs {max_tokens} max")
+
+ # Check for valid roles
+ valid_roles = {'system', 'user', 'assistant'}
+ for msg in messages:
+ if msg.get('role') not in valid_roles:
+ return False, f"Invalid role: {msg.get('role')}"
+
+ return True, None
+
+ def _track_stats(self, context, success, error_type=None, response_time=None):
+ """Track API call statistics"""
+ self.stats['total_requests'] += 1
+
+ if not success:
+ self.stats['empty_results'] += 1
+ error_key = f"{getattr(self, 'client_type', 'unknown')}_{context}_{error_type}"
+ self.stats['errors'][error_key] = self.stats['errors'].get(error_key, 0) + 1
+
+ if response_time:
+ self.stats['response_times'].append(response_time)
+
+ # Save stats periodically
+ if self.stats['total_requests'] % 10 == 0:
+ self._save_stats()
+
+ def _save_stats(self):
+ """Save statistics to file"""
+ stats_file = "api_stats.json"
+ try:
+ with open(stats_file, 'w') as f:
+ json.dump(self.stats, f, indent=2)
+ except Exception as e:
+ print(f"Failed to save stats: {e}")
+
+ def _save_failed_request(self, messages, error, context, response=None):
+ """Save failed requests for debugging"""
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+ failed_dir = "Payloads/failed_requests"
+ os.makedirs(failed_dir, exist_ok=True)
+
+ failure_data = {
+ 'timestamp': timestamp,
+ 'context': context,
+ 'error': str(error),
+ 'error_type': type(error).__name__,
+ 'messages': messages,
+ 'model': getattr(self, 'model', None),
+ 'client_type': getattr(self, 'client_type', None),
+ 'response': str(response) if response else None,
+ 'traceback': traceback.format_exc()
+ }
+
+ filename = f"{failed_dir}/failed_{context}_{getattr(self, 'client_type', 'unknown')}_{timestamp}.json"
+ with open(filename, 'w', encoding='utf-8') as f:
+ json.dump(failure_data, f, indent=2, ensure_ascii=False)
+
+ logger.info(f"Saved failed request to: {filename}")
+
+ def _handle_empty_result(self, messages, context, error_info):
+ """Handle empty results with context-aware fallbacks"""
+ print(f"Handling empty result for context: {context}, error: {error_info}")
+
+ # Log detailed error information for debugging
+ if isinstance(error_info, dict):
+ error_type = error_info.get('error', 'unknown')
+ error_details = error_info.get('details', '')
+ else:
+ error_type = str(error_info)
+ error_details = ''
+
+ # Check if this is an extraction failure vs actual empty response
+ is_extraction_failure = 'extract' in error_type.lower() or 'parse' in error_type.lower()
+
+ if context == 'glossary':
+ # For glossary, we might have partial data in error_info
+ if is_extraction_failure and isinstance(error_info, dict):
+ # Check if raw response is available
+ raw_response = error_info.get('raw_response', '')
+ if raw_response and 'character' in str(raw_response):
+ # Log that data exists but couldn't be extracted
+ print("⚠️ Glossary data exists in response but extraction failed!")
+ print(" Consider checking response extraction for this provider")
+
+ # Return empty but valid JSON
+ return "[]"
+
+ elif context == 'translation':
+ # Extract the original text and return it with a marker
+ original_text = self._extract_user_content(messages)
+
+ # Add more specific error info if available
+ if is_extraction_failure:
+ return f"[EXTRACTION FAILED - ORIGINAL TEXT PRESERVED]\n{original_text}"
+ elif 'rate' in error_type.lower():
+ return f"[RATE LIMITED - ORIGINAL TEXT PRESERVED]\n{original_text}"
+ elif 'safety' in error_type.lower() or 'prohibited' in error_type.lower():
+ return f"[CONTENT BLOCKED - ORIGINAL TEXT PRESERVED]\n{original_text}"
+ else:
+ return f"[TRANSLATION FAILED - ORIGINAL TEXT PRESERVED]\n{original_text}"
+
+ elif context == 'image_translation':
+ # Provide more specific error messages for image translation
+ if 'size' in error_type.lower():
+ return "[IMAGE TOO LARGE - TRANSLATION FAILED]"
+ elif 'format' in error_type.lower():
+ return "[UNSUPPORTED IMAGE FORMAT - TRANSLATION FAILED]"
+ elif is_extraction_failure:
+ return "[RESPONSE EXTRACTION FAILED]"
+ else:
+ return "[IMAGE TRANSLATION FAILED]"
+
+ elif context == 'manga':
+ # Add manga-specific handling
+ return "[MANGA TRANSLATION FAILED]"
+
+ elif context == 'metadata':
+ # For metadata extraction
+ return "{}"
+
+ else:
+ # Generic fallback with error type
+ if is_extraction_failure:
+ return "[RESPONSE EXTRACTION FAILED]"
+ elif 'rate' in error_type.lower():
+ return "[RATE LIMITED - PLEASE RETRY]"
+ else:
+ return "[AI RESPONSE UNAVAILABLE]"
+
+ def _extract_response_text(self, response, provider=None, **kwargs):
+ """
+ Universal response text extraction that works across all providers.
+ Includes enhanced OpenAI-specific handling and proper Gemini support.
+ """
+ result = ""
+ finish_reason = 'stop'
+
+ # Determine provider if not specified
+ if provider is None:
+ provider = self.client_type
+
+ self._debug_log(f" 🔍 Extracting text from {provider} response...")
+ self._debug_log(f" 🔍 Response type: {type(response)}")
+
+ # Handle UnifiedResponse objects
+ if isinstance(response, UnifiedResponse):
+ # Check if content is a string (even if empty)
+ if response.content is not None and isinstance(response.content, str):
+ # Always return the content from UnifiedResponse
+ if len(response.content) > 0:
+ self._debug_log(f" ✅ Got text from UnifiedResponse.content: {len(response.content)} chars")
+ else:
+ self._debug_log(f" ⚠️ UnifiedResponse has empty content (finish_reason: {response.finish_reason})")
+ return response.content, response.finish_reason or 'stop'
+ elif response.error_details:
+ self._debug_log(f" ⚠️ UnifiedResponse has error_details: {response.error_details}")
+ return "", response.finish_reason or 'error'
+ else:
+ # Only try to extract from raw_response if content is actually None
+ self._debug_log(f" ⚠️ UnifiedResponse.content is None, checking raw_response...")
+ if hasattr(response, 'raw_response') and response.raw_response:
+ self._debug_log(f" 🔍 Found raw_response, attempting extraction...")
+ response = response.raw_response
+ else:
+ self._debug_log(f" ⚠️ No raw_response found")
+ return "", 'error'
+
+ # ========== GEMINI-SPECIFIC HANDLING ==========
+ if provider == 'gemini':
+ self._debug_log(f" 🔍 [Gemini] Attempting specialized extraction...")
+
+ # Check for Gemini-specific response structure
+ if hasattr(response, 'candidates'):
+ self._debug_log(f" 🔍 [Gemini] Found candidates attribute")
+ if response.candidates:
+ candidate = response.candidates[0]
+
+ # Check finish reason
+ if hasattr(candidate, 'finish_reason'):
+ finish_reason = str(candidate.finish_reason).lower()
+ self._debug_log(f" 🔍 [Gemini] Finish reason: {finish_reason}")
+
+ # Map Gemini finish reasons
+ if 'max_tokens' in finish_reason:
+ finish_reason = 'length'
+ elif 'safety' in finish_reason or 'blocked' in finish_reason:
+ finish_reason = 'content_filter'
+ elif 'stop' in finish_reason:
+ finish_reason = 'stop'
+
+ # Extract content from candidate
+ if hasattr(candidate, 'content'):
+ content = candidate.content
+ self._debug_log(f" 🔍 [Gemini] Content object: {content}")
+ self._debug_log(f" 🔍 [Gemini] Content type: {type(content)}")
+ self._debug_log(f" 🔍 [Gemini] Content attributes: {[attr for attr in dir(content) if not attr.startswith('_')][:10]}")
+
+ # NEW: Try to access content as string directly first
+ try:
+ content_str = str(content)
+ if content_str and len(content_str) > 20 and 'role=' not in content_str:
+ self._debug_log(f" ✅ [Gemini] Got content from string conversion: {len(content_str)} chars")
+ return content_str, finish_reason
+ except Exception as e:
+ self._debug_log(f" ⚠️ [Gemini] String conversion failed: {e}")
+
+ # Content might have parts - FIX: Add None check for parts
+ if hasattr(content, 'parts') and content.parts is not None:
+ parts_count = self._safe_len(content.parts, "gemini_content_parts")
+ self._debug_log(f" 🔍 [Gemini] Found {parts_count} parts in content")
+ text_parts = []
+
+ for i, part in enumerate(content.parts):
+ part_text = self._extract_part_text(part, provider='gemini', part_index=i+1)
+ if part_text:
+ text_parts.append(part_text)
+
+ if text_parts:
+ result = ''.join(text_parts)
+ self._debug_log(f" ✅ [Gemini] Extracted from parts: {len(result)} chars")
+ return result, finish_reason
+
+ else:
+ # NEW: Handle case where parts exist but contain no text
+ parts_count = self._safe_len(content.parts, "gemini_empty_parts")
+ self._debug_log(f" ⚠️ [Gemini] Parts found but no text extracted from {parts_count} parts")
+ # Don't return here, try other methods
+
+ # Try direct text access on content
+ elif hasattr(content, 'text'):
+ if content.text:
+ self._debug_log(f" ✅ [Gemini] Got text from content.text: {len(content.text)} chars")
+ return content.text, finish_reason
+
+ # NEW: Try accessing raw content data
+ for attr in ['text', 'content', 'data', 'message', 'response']:
+ if hasattr(content, attr):
+ try:
+ value = getattr(content, attr)
+ if value and isinstance(value, str) and len(value) > 10:
+ print(f" ✅ [Gemini] Got text from content.{attr}: {len(value)} chars")
+ return value, finish_reason
+ except Exception as e:
+ print(f" ⚠️ [Gemini] Failed to get content.{attr}: {e}")
+
+ # Try to get text directly from candidate
+ if hasattr(candidate, 'text'):
+ if candidate.text:
+ print(f" ✅ [Gemini] Got text from candidate.text: {len(candidate.text)} chars")
+ return candidate.text, finish_reason
+
+ # Alternative Gemini response structure (for native SDK)
+ if hasattr(response, 'text'):
+ try:
+ # This might be a property that needs to be called
+ text = response.text
+ if text:
+ print(f" ✅ [Gemini] Got text via response.text property: {len(text)} chars")
+ return text, finish_reason
+ except Exception as e:
+ print(f" ⚠️ [Gemini] Error accessing response.text: {e}")
+
+ # Try parts directly on response - FIX: Add None check
+ if hasattr(response, 'parts') and response.parts is not None:
+ print(f" 🔍 [Gemini] Found parts directly on response")
+ text_parts = []
+ for i, part in enumerate(response.parts):
+ part_text = self._extract_part_text(part, provider='gemini', part_index=i+1)
+ if part_text:
+ text_parts.append(part_text)
+
+ if text_parts:
+ result = ''.join(text_parts)
+ print(f" ✅ [Gemini] Extracted from direct parts: {len(result)} chars")
+ return result, finish_reason
+
+ print(f" ⚠️ [Gemini] Specialized extraction failed, trying generic methods...")
+
+ # ========== ENHANCED OPENAI HANDLING ==========
+ elif provider == 'openai':
+ print(f" 🔍 [OpenAI] Attempting specialized extraction...")
+
+ # Check if it's an OpenAI ChatCompletion object
+ if hasattr(response, 'choices') and response.choices is not None:
+ choices_count = self._safe_len(response.choices, "openai_response_choices")
+ print(f" 🔍 [OpenAI] Found choices attribute, {choices_count} choices")
+
+ if response.choices:
+ choice = response.choices[0]
+
+ # Log choice details
+ print(f" 🔍 [OpenAI] Choice type: {type(choice)}")
+
+ # Get finish reason
+ if hasattr(choice, 'finish_reason'):
+ finish_reason = choice.finish_reason
+ print(f" 🔍 [OpenAI] Finish reason: {finish_reason}")
+
+ # Normalize finish reasons
+ if finish_reason == 'max_tokens':
+ finish_reason = 'length'
+ elif finish_reason == 'content_filter':
+ finish_reason = 'content_filter'
+
+ # Extract message content
+ if hasattr(choice, 'message'):
+ message = choice.message
+ print(f" 🔍 [OpenAI] Message type: {type(message)}")
+
+ # Check for refusal first
+ if hasattr(message, 'refusal') and message.refusal:
+ print(f" 🚫 [OpenAI] Message was refused: {message.refusal}")
+ return f"[REFUSED]: {message.refusal}", 'content_filter'
+
+ # Try to get content
+ if hasattr(message, 'content'):
+ content = message.content
+
+ # Handle None content
+ if content is None:
+ print(f" ⚠️ [OpenAI] message.content is None")
+
+ # Check if it's a function call instead
+ if hasattr(message, 'function_call'):
+ print(f" 🔍 [OpenAI] Found function_call instead of content")
+ return "", 'function_call'
+ elif hasattr(message, 'tool_calls'):
+ print(f" 🔍 [OpenAI] Found tool_calls instead of content")
+ return "", 'tool_call'
+ else:
+ print(f" ⚠️ [OpenAI] No content, refusal, or function calls found")
+ return "", finish_reason or 'error'
+
+ # Handle empty string content
+ elif content == "":
+ print(f" ⚠️ [OpenAI] message.content is empty string")
+ if finish_reason == 'length':
+ print(f" ⚠️ [OpenAI] Empty due to length limit (tokens too low)")
+ return "", finish_reason or 'error'
+
+ # Valid content found
+ else:
+ print(f" ✅ [OpenAI] Got content: {len(content)} chars")
+ return content, finish_reason
+
+ # Try alternative attributes
+ elif hasattr(message, 'text'):
+ print(f" 🔍 [OpenAI] Trying message.text...")
+ if message.text:
+ print(f" ✅ [OpenAI] Got text: {len(message.text)} chars")
+ return message.text, finish_reason
+
+ # Try dict access if message is dict-like
+ elif hasattr(message, 'get'):
+ content = message.get('content') or message.get('text')
+ if content:
+ print(f" ✅ [OpenAI] Got content via dict access: {len(content)} chars")
+ return content, finish_reason
+
+ # Log all available attributes for debugging
+ print(f" ⚠️ [OpenAI] Message attributes: {[attr for attr in dir(message) if not attr.startswith('_')]}")
+ else:
+ print(f" ⚠️ [OpenAI] Empty choices array")
+
+ # Check if there's metadata about why it's empty
+ if hasattr(response, 'model'):
+ print(f" Model used: {response.model}")
+ if hasattr(response, 'id'):
+ print(f" Response ID: {response.id}")
+ if hasattr(response, 'usage'):
+ print(f" Token usage: {response.usage}")
+
+ # If OpenAI extraction failed, continue to generic methods
+ print(f" ⚠️ [OpenAI] Specialized extraction failed, trying generic methods...")
+
+ # ========== GENERIC EXTRACTION METHODS ==========
+
+ # Method 1: Direct text attributes (common patterns)
+ text_attributes = ['text', 'content', 'message', 'output', 'response', 'answer', 'reply']
+
+ for attr in text_attributes:
+ if hasattr(response, attr):
+ try:
+ value = getattr(response, attr)
+ if value is not None and isinstance(value, str) and len(value) > 0:
+ result = value
+ print(f" ✅ Got text from response.{attr}: {len(result)} chars")
+ return result, finish_reason
+ except Exception as e:
+ print(f" ⚠️ Failed to get response.{attr}: {e}")
+
+ # Method 2: Common nested patterns
+ nested_patterns = [
+ # OpenAI/Mistral pattern
+ lambda r: r.choices[0].message.content if hasattr(r, 'choices') and r.choices and hasattr(r.choices[0], 'message') and hasattr(r.choices[0].message, 'content') else None,
+ # Alternative OpenAI pattern
+ lambda r: r.choices[0].text if hasattr(r, 'choices') and r.choices and hasattr(r.choices[0], 'text') else None,
+ # Anthropic SDK pattern
+ lambda r: r.content[0].text if hasattr(r, 'content') and r.content and hasattr(r.content[0], 'text') else None,
+ # Gemini pattern - candidates structure
+ lambda r: r.candidates[0].content.parts[0].text if hasattr(r, 'candidates') and r.candidates and hasattr(r.candidates[0], 'content') and hasattr(r.candidates[0].content, 'parts') and r.candidates[0].content.parts else None,
+ # Cohere pattern
+ lambda r: r.text if hasattr(r, 'text') else None,
+ # JSON response pattern
+ lambda r: r.get('choices', [{}])[0].get('message', {}).get('content') if isinstance(r, dict) else None,
+ lambda r: r.get('content') if isinstance(r, dict) else None,
+ lambda r: r.get('text') if isinstance(r, dict) else None,
+ lambda r: r.get('output') if isinstance(r, dict) else None,
+ ]
+
+ for i, pattern in enumerate(nested_patterns):
+ try:
+ extracted = pattern(response)
+ if extracted is not None and isinstance(extracted, str) and len(extracted) > 0:
+ result = extracted
+ print(f" ✅ Extracted via nested pattern {i+1}: {len(result)} chars")
+ return result, finish_reason
+ except Exception as e:
+ # Log pattern failures for debugging
+ if provider in ['openai', 'gemini'] and i < 4: # First patterns are provider-specific
+ print(f" ⚠️ [{provider}] Pattern {i+1} failed: {e}")
+
+ # Method 3: String representation extraction (last resort)
+ if not result:
+ print(f" 🔍 Attempting string extraction as last resort...")
+ result = self._extract_from_string(response, provider=provider)
+ if result:
+ print(f" 🔧 Extracted from string representation: {len(result)} chars")
+ return result, finish_reason
+
+ # Method 4: AGGRESSIVE GEMINI FALLBACK - Parse response string manually
+ if provider == 'gemini' and not result:
+ print(f" 🔍 [Gemini] Attempting aggressive manual parsing...")
+ try:
+ response_str = str(response)
+
+ # Look for common patterns in Gemini response strings
+ import re
+ patterns = [
+ r'text["\']([^"\'].*?)["\']', # text="content" or text='content'
+ r'text=([^,\)\]]+)', # text=content
+ r'content["\']([^"\'].*?)["\']', # content="text"
+ r'>([^<>{},\[\]]+)<', # HTML-like tags
+ ]
+
+ for pattern in patterns:
+ matches = re.findall(pattern, response_str, re.DOTALL)
+ for match in matches:
+ if match and len(match.strip()) > 20:
+ # Clean up the match
+ clean_match = match.strip()
+ clean_match = clean_match.replace('\\n', '\n').replace('\\t', '\t')
+ if len(clean_match) > 20:
+ print(f" 🔧 [Gemini] Extracted via regex pattern: {len(clean_match)} chars")
+ return clean_match, finish_reason
+
+ # If no patterns match, try to find the largest text block
+ words = response_str.split()
+ text_blocks = []
+ current_block = []
+
+ for word in words:
+ if len(word) > 2 and word.isalpha() or any(c.isalpha() for c in word):
+ current_block.append(word)
+ else:
+ if len(current_block) > 5: # At least 5 words
+ text_blocks.append(' '.join(current_block))
+ current_block = []
+
+ if current_block and len(current_block) > 5:
+ text_blocks.append(' '.join(current_block))
+
+ if text_blocks:
+ # Return the longest text block
+ longest_block = max(text_blocks, key=len)
+ if len(longest_block) > 50:
+ print(f" 🔧 [Gemini] Extracted longest text block: {len(longest_block)} chars")
+ return longest_block, finish_reason
+
+ except Exception as e:
+ print(f" ⚠️ [Gemini] Aggressive parsing failed: {e}")
+
+ # Final failure - log detailed debug info
+ print(f" ❌ Failed to extract text from {provider} response")
+
+ # Log the full response structure for debugging
+ print(f" 🔍 [{provider}] Full response structure:")
+ print(f" Type: {type(response)}")
+
+ # Log available attributes
+ if hasattr(response, '__dict__'):
+ attrs = list(response.__dict__.keys())[:20]
+ print(f" Attributes: {attrs}")
+ else:
+ attrs = [attr for attr in dir(response) if not attr.startswith('_')][:20]
+ print(f" Dir attributes: {attrs}")
+
+ # Try to get any text representation as absolute last resort
+ try:
+ response_str = str(response)
+ if len(response_str) > 100 and len(response_str) < 100000: # Reasonable size
+ print(f" 🔍 Response string representation: {response_str[:500]}...")
+ except:
+ pass
+
+ return "", 'error'
+
+
+ def _extract_part_text(self, part, provider=None, part_index=None):
+ """
+ Extract text from a part object (handles various formats).
+ Enhanced with provider-specific handling and aggressive extraction.
+ """
+ if provider == 'gemini' and part_index:
+ print(f" 🔍 [Gemini] Part {part_index} type: {type(part)}")
+ print(f" 🔍 [Gemini] Part {part_index} attributes: {[attr for attr in dir(part) if not attr.startswith('_')][:10]}")
+
+ # Direct text attribute
+ if hasattr(part, 'text'):
+ try:
+ text = part.text
+ if text:
+ if provider == 'gemini' and part_index:
+ print(f" ✅ [Gemini] Part {part_index} has text via direct access: {len(text)} chars")
+ return text
+ except Exception as e:
+ if provider == 'gemini' and part_index:
+ print(f" ⚠️ [Gemini] Failed direct access on part {part_index}: {e}")
+
+ # NEW: Try direct string conversion of the part
+ try:
+ part_str = str(part)
+ if part_str and len(part_str) > 10 and 'text=' not in part_str.lower():
+ if provider == 'gemini' and part_index:
+ print(f" ✅ [Gemini] Part {part_index} extracted as string: {len(part_str)} chars")
+ return part_str
+ except Exception as e:
+ if provider == 'gemini' and part_index:
+ print(f" ⚠️ [Gemini] Part {part_index} string conversion failed: {e}")
+
+ # Use getattr with fallback
+ try:
+ text = getattr(part, 'text', None)
+ if text:
+ if provider == 'gemini' and part_index:
+ print(f" ✅ [Gemini] Part {part_index} has text via getattr: {len(text)} chars")
+ return text
+ except Exception as e:
+ if provider == 'gemini' and part_index:
+ print(f" ⚠️ [Gemini] Failed getattr on part {part_index}: {e}")
+
+ # String representation extraction
+ part_str = str(part)
+
+ if provider == 'gemini' and part_index:
+ print(f" 🔍 [Gemini] Part {part_index} string representation length: {len(part_str)}")
+
+ if 'text=' in part_str or 'text":' in part_str:
+ import re
+ patterns = [
+ r'text="""(.*?)"""', # Triple quotes (common in Gemini)
+ r'text="([^"]*(?:\\.[^"]*)*)"', # Double quotes with escaping
+ r"text='([^']*(?:\\.[^']*)*)'", # Single quotes
+ r'text=([^,\)]+)', # Unquoted text (last resort)
+ ]
+
+ for pattern in patterns:
+ match = re.search(pattern, part_str, re.DOTALL)
+ if match:
+ text = match.group(1)
+ # Unescape common escape sequences
+ text = text.replace('\\n', '\n')
+ text = text.replace('\\t', '\t')
+ text = text.replace('\\r', '\r')
+ text = text.replace('\\"', '"')
+ text = text.replace("\\'", "'")
+ text = text.replace('\\\\', '\\')
+
+ if provider == 'gemini' and part_index:
+ #print(f" 🔧 [Gemini] Part {part_index} extracted via regex pattern: {len(text)} chars")
+ pass
+
+ return text
+
+ # Part is itself a string
+ if isinstance(part, str):
+ if provider == 'gemini' and part_index:
+ print(f" ✅ [Gemini] Part {part_index} is a string: {len(part)} chars")
+ return part
+
+ if provider == 'gemini' and part_index:
+ print(f" ⚠️ [Gemini] Failed string extraction on part {part_index}")
+
+ return None
+
+
+ def _extract_from_string(self, response, provider=None):
+ """
+ Extract text from string representation of response.
+ Enhanced with provider-specific patterns.
+ """
+ try:
+ response_str = str(response)
+ import re
+
+ # Common patterns in string representations
+ patterns = [
+ r'text="""(.*?)"""', # Triple quotes (Gemini often uses this)
+ r'text="([^"]*(?:\\.[^"]*)*)"', # Double quotes
+ r"text='([^']*(?:\\.[^']*)*)'", # Single quotes
+ r'content="([^"]*(?:\\.[^"]*)*)"', # Content field
+ r'content="""(.*?)"""', # Triple quoted content
+ r'"text":\s*"([^"]*(?:\\.[^"]*)*)"', # JSON style
+ r'"content":\s*"([^"]*(?:\\.[^"]*)*)"', # JSON content
+ ]
+
+ for pattern in patterns:
+ match = re.search(pattern, response_str, re.DOTALL)
+ if match:
+ text = match.group(1)
+ # Unescape common escape sequences
+ text = text.replace('\\n', '\n')
+ text = text.replace('\\t', '\t')
+ text = text.replace('\\r', '\r')
+ text = text.replace('\\"', '"')
+ text = text.replace("\\'", "'")
+ text = text.replace('\\\\', '\\')
+
+ if provider == 'gemini':
+ #print(f" 🔧 [Gemini] Extracted from string using pattern: {pattern[:30]}...")
+ pass
+
+ return text
+ except Exception as e:
+ if provider == 'gemini':
+ print(f" ⚠️ [Gemini] Error during string extraction: {e}")
+ else:
+ print(f" ⚠️ Error during string extraction: {e}")
+
+ return None
+
+ def _extract_user_content(self, messages):
+ """Extract user content from messages"""
+ for msg in reversed(messages):
+ if msg.get('role') == 'user':
+ return msg.get('content', '')
+ return ''
+
+ def _get_file_names(self, messages, context=None):
+ """Generate appropriate file names based on context
+
+ IMPORTANT: File naming must support duplicate detection across chapters
+ """
+ if context == 'glossary':
+ payload_name = f"glossary_payload_{self.conversation_message_count}.json"
+ response_name = f"glossary_response_{self.conversation_message_count}.txt"
+ elif context == 'translation':
+ # Extract chapter info if available - CRITICAL for duplicate detection
+ content_str = str(messages)
+ # Remove any rolling summary blocks to avoid picking previous chapter numbers
+ try:
+ content_str = re.sub(r"\[Rolling Summary of Chapter \d+\][\s\S]*?\[End of Rolling Summary\]", "", content_str, flags=re.IGNORECASE)
+ except Exception:
+ pass
+ chapter_match = re.search(r'Chapter (\d+)', content_str)
+ if chapter_match:
+ chapter_num = chapter_match.group(1)
+ # Use standard naming that duplicate detection expects
+ payload_name = f"translation_chapter_{chapter_num}_payload.json"
+ response_name = f"response_{chapter_num}.html" # This format is expected by duplicate detection
+ else:
+ # Check for chunk information
+ chunk_match = re.search(r'Chunk (\d+)/(\d+)', str(messages))
+ if chunk_match:
+ chunk_num = chunk_match.group(1)
+ total_chunks = chunk_match.group(2)
+ # Extract chapter from fuller context
+ chapter_in_chunk = re.search(r'Chapter (\d+)', str(messages))
+ if chapter_in_chunk:
+ chapter_num = chapter_in_chunk.group(1)
+ payload_name = f"translation_chapter_{chapter_num}_chunk_{chunk_num}_payload.json"
+ response_name = f"response_{chapter_num}_chunk_{chunk_num}.html"
+ else:
+ payload_name = f"translation_chunk_{chunk_num}_of_{total_chunks}_payload.json"
+ response_name = f"response_chunk_{chunk_num}_of_{total_chunks}.html"
+ else:
+ payload_name = f"translation_payload_{self.conversation_message_count}.json"
+ response_name = f"response_{self.conversation_message_count}.html"
+ else:
+ payload_name = f"{context or 'general'}_payload_{self.conversation_message_count}.json"
+ response_name = f"{context or 'general'}_response_{self.conversation_message_count}.txt"
+ self._last_response_filename = response_name
+ return payload_name, response_name
+
+ def _save_payload(self, messages, filename, retry_reason=None):
+ """Save request payload for debugging with retry reason tracking"""
+
+ # Get stable thread directory
+ thread_dir = self._get_thread_directory()
+
+ # Generate request hash for the filename (to make it unique)
+ request_hash = self._get_request_hash(messages)
+
+ # Add hash and retry info to filename
+ base_name, ext = os.path.splitext(filename)
+ timestamp = datetime.now().strftime("%H%M%S")
+
+ # Include retry reason in filename if provided
+ if retry_reason:
+ # Sanitize retry reason for filename
+ safe_reason = retry_reason.replace(" ", "_").replace("/", "_")[:20]
+ unique_filename = f"{base_name}_{timestamp}_{safe_reason}_{request_hash[:6]}{ext}"
+ else:
+ unique_filename = f"{base_name}_{timestamp}_{request_hash[:6]}{ext}"
+
+ filepath = os.path.join(thread_dir, unique_filename)
+
+ try:
+ # Thread-safe file writing
+ with self._file_write_lock:
+ thread_name = threading.current_thread().name
+ thread_id = threading.current_thread().ident
+
+ # Extract chapter info for better tracking
+ chapter_info = self._extract_chapter_info(messages)
+
+ # Include debug info with retry reason
+ debug_info = {
+ 'system_prompt_present': any(msg.get('role') == 'system' for msg in messages),
+ 'system_prompt_length': 0,
+ 'request_hash': request_hash,
+ 'thread_name': thread_name,
+ 'thread_id': thread_id,
+ 'session_id': self.session_id,
+ 'chapter_info': chapter_info,
+ 'timestamp': datetime.now().isoformat(),
+ 'key_identifier': self.key_identifier,
+ 'retry_reason': retry_reason, # Track why this payload was saved
+ 'is_retry': retry_reason is not None
+ }
+
+ for msg in messages:
+ if msg.get('role') == 'system':
+ debug_info['system_prompt_length'] = len(msg.get('content', ''))
+ break
+
+ # Write the payload
+ with open(filepath, 'w', encoding='utf-8') as f:
+ json.dump({
+ 'model': getattr(self, 'model', None),
+ 'client_type': getattr(self, 'client_type', None),
+ 'messages': messages,
+ 'timestamp': datetime.now().isoformat(),
+ 'debug': debug_info,
+ 'key_identifier': getattr(self, 'key_identifier', None),
+ 'retry_info': {
+ 'reason': retry_reason,
+ 'attempt': getattr(self, '_current_retry_attempt', 0),
+ 'max_retries': getattr(self, '_max_retries', 7)
+ } if retry_reason else None
+ }, f, indent=2, ensure_ascii=False)
+
+ logger.debug(f"[{thread_name}] Saved payload to: {filepath} (reason: {retry_reason or 'initial'})")
+
+ except Exception as e:
+ print(f"Failed to save payload: {e}")
+
+
+ def _save_response(self, content: str, filename: str):
+ """Save API response with enhanced thread safety and deduplication"""
+ if not content or not os.getenv("SAVE_PAYLOAD", "1") == "1":
+ return
+
+ # ONLY save JSON files to Payloads folder
+ if not filename.endswith('.json'):
+ logger.debug(f"Skipping HTML response save to Payloads: {filename}")
+ return
+
+ # Get thread-specific directory
+ thread_dir = self._get_thread_directory()
+ thread_id = threading.current_thread().ident
+
+ try:
+ # Generate content hash for deduplication
+ content_hash = hashlib.sha256(content.encode()).hexdigest()[:12]
+
+ # Clean up filename
+ safe_filename = os.path.basename(filename)
+ base_name, ext = os.path.splitext(safe_filename)
+
+ # Create unique filename with thread ID and content hash
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")[:19] # Include microseconds
+ unique_filename = f"{base_name}_T{thread_id}_{timestamp}_{content_hash}{ext}"
+ filepath = os.path.join(thread_dir, unique_filename)
+
+ # Get file-specific lock
+ file_lock = self._get_file_lock(filepath)
+
+ with file_lock:
+ # Check if this exact content was already saved (deduplication)
+ if self._is_duplicate_file(thread_dir, content_hash):
+ logger.debug(f"Skipping duplicate response save: {content_hash[:8]}")
+ return
+
+ # Write atomically with temp file
+ temp_filepath = filepath + '.tmp'
+
+ try:
+ os.makedirs(thread_dir, exist_ok=True)
+
+ if filename.endswith('.json'):
+ try:
+ json_content = json.loads(content) if isinstance(content, str) else content
+ with open(temp_filepath, 'w', encoding='utf-8') as f:
+ json.dump(json_content, f, indent=2, ensure_ascii=False)
+ except json.JSONDecodeError:
+ with open(temp_filepath, 'w', encoding='utf-8') as f:
+ f.write(content)
+ else:
+ with open(temp_filepath, 'w', encoding='utf-8') as f:
+ f.write(content)
+
+ # Atomic rename
+ os.replace(temp_filepath, filepath)
+ logger.debug(f"Saved response: {filepath}")
+
+ except Exception as e:
+ if os.path.exists(temp_filepath):
+ os.remove(temp_filepath)
+ raise
+
+ except Exception as e:
+ print(f"Failed to save response: {e}")
+
+ def _get_file_lock(self, filepath: str) -> RLock:
+ """Get or create a lock for a specific file"""
+ with self._file_write_locks_lock:
+ if filepath not in self._file_write_locks:
+ self._file_write_locks[filepath] = RLock()
+ return self._file_write_locks[filepath]
+
+ def _is_duplicate_file(self, directory: str, content_hash: str) -> bool:
+ """Check if a file with this content hash already exists"""
+ try:
+ for filename in os.listdir(directory):
+ if content_hash in filename and filename.endswith('.json'):
+ return True
+ except:
+ pass
+ return False
+
+ def set_output_filename(self, filename: str):
+ """Set the actual output filename for truncation logging
+
+ This should be called before sending a request to inform the client
+ about the actual chapter output filename (e.g., response_001_Chapter_1.html)
+
+ Args:
+ filename: The actual output filename that will be created in the book folder
+ """
+ self._actual_output_filename = filename
+ logger.debug(f"Set output filename for truncation logging: {filename}")
+
+ def set_output_directory(self, directory: str):
+ """Set the output directory for truncation logs
+
+ Args:
+ directory: The output directory path (e.g., the book folder)
+ """
+ self.output_dir = directory
+ logger.debug(f"Set output directory: {directory}")
+
+ def cancel_current_operation(self):
+ """Mark current operation as cancelled
+
+ IMPORTANT: Called by send_with_interrupt when timeout occurs
+ """
+ self._cancelled = True
+ self._in_cleanup = True # Set cleanup flag correctly
+ # Show cancellation messages before setting global flag (to avoid circular check)
+ print("🛑 Operation cancelled (timeout or user stop)")
+ print("🛑 API operation cancelled")
+ # Set global cancellation to affect all instances
+ self.set_global_cancellation(True)
+ # Suppress httpx logging when cancelled
+ self._suppress_http_logs()
+
+ def _suppress_http_logs(self):
+ """Suppress HTTP and API logging during cancellation"""
+ import logging
+ # Suppress httpx logs (used by OpenAI client)
+ httpx_logger = logging.getLogger('httpx')
+ httpx_logger.setLevel(logging.WARNING)
+
+ # Suppress OpenAI client logs
+ openai_logger = logging.getLogger('openai')
+ openai_logger.setLevel(logging.WARNING)
+
+ # Suppress our own API client logs
+ unified_logger = logging.getLogger('unified_api_client')
+ unified_logger.setLevel(logging.WARNING)
+
+ def _reset_http_logs(self):
+ """Reset HTTP and API logging levels for new operations"""
+ import logging
+ # Reset httpx logs back to INFO
+ httpx_logger = logging.getLogger('httpx')
+ httpx_logger.setLevel(logging.INFO)
+
+ # Reset OpenAI client logs back to INFO
+ openai_logger = logging.getLogger('openai')
+ openai_logger.setLevel(logging.INFO)
+
+ # Reset our own API client logs back to INFO
+ unified_logger = logging.getLogger('unified_api_client')
+ unified_logger.setLevel(logging.INFO)
+
+ def reset_cleanup_state(self):
+ """Reset cleanup state for new operations"""
+ self._in_cleanup = False
+ self._cancelled = False
+ # Reset global cancellation flag for new operations
+ self.set_global_cancellation(False)
+ # Reset logging levels for new operations
+ self._reset_http_logs()
+
+ def _send_vertex_model_garden(self, messages, temperature=0.7, max_tokens=None, stop_sequences=None, response_name=None):
+ """Send request to Vertex AI Model Garden models (including Claude)"""
+ response = None
+ try:
+ from google.cloud import aiplatform
+ from google.oauth2 import service_account
+ from google.auth.transport.requests import Request
+ import google.auth.transport.requests
+ import vertexai
+ import json
+ import os
+ import re
+ import traceback
+ import logging
+
+ # Get logger
+ logger = logging.getLogger(__name__)
+
+ # Import or define UnifiedClientError
+ try:
+ # Try to import from the module if it exists
+ from unified_api_client import UnifiedClientError, UnifiedResponse
+ except ImportError:
+ # Define them locally if import fails
+ class UnifiedClientError(Exception):
+ def __init__(self, message, error_type=None):
+ super().__init__(message)
+ self.error_type = error_type
+
+ from dataclasses import dataclass
+ @dataclass
+ class UnifiedResponse:
+ content: str
+ usage: dict = None
+ finish_reason: str = 'stop'
+ raw_response: object = None
+
+ # Import your global stop check function
+ try:
+ from TranslateKRtoEN import is_stop_requested
+ except ImportError:
+ # Fallback to checking _cancelled flag
+ def is_stop_requested():
+ return self._cancelled
+
+ # Use the same credentials as Cloud Vision (comes from GUI config)
+ google_creds_path = os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
+ if not google_creds_path:
+ # Try to get from config
+ if hasattr(self, 'main_gui') and hasattr(self.main_gui, 'config'):
+ google_creds_path = self.main_gui.config.get('google_vision_credentials', '') or \
+ self.main_gui.config.get('google_cloud_credentials', '')
+
+ if not google_creds_path or not os.path.exists(google_creds_path):
+ raise ValueError("Google Cloud credentials not found. Please set up credentials.")
+
+ # Load credentials with proper scopes
+ credentials = service_account.Credentials.from_service_account_file(
+ google_creds_path,
+ scopes=['https://www.googleapis.com/auth/cloud-platform']
+ )
+
+ # Extract project ID from credentials
+ with open(google_creds_path, 'r') as f:
+ creds_data = json.load(f)
+ project_id = creds_data.get('project_id')
+
+ if not project_id:
+ raise ValueError("Project ID not found in credentials file")
+
+ logger.info(f"Using project ID: {project_id}")
+
+ # Parse model name
+ model_name = self.model
+ if model_name.startswith('vertex_ai/'):
+ model_name = model_name[10:] # Remove "vertex_ai/" prefix
+ elif model_name.startswith('vertex/'):
+ model_name = model_name[7:] # Remove "vertex/" prefix
+
+ logger.info(f"Using model: {model_name}")
+
+ # For Claude models, use the Anthropic SDK with Vertex AI
+ if 'claude' in model_name.lower():
+ # Import Anthropic exceptions
+ try:
+ from anthropic import AnthropicVertex
+ import anthropic
+ import httpx
+ except ImportError:
+ raise UnifiedClientError("Anthropic SDK not installed. Run: pip install anthropic")
+
+ # Use the region from environment variable (which comes from GUI)
+ region = os.getenv('VERTEX_AI_LOCATION', 'us-east5')
+
+ # CHECK STOP FLAG
+ if is_stop_requested():
+ logger.info("Stop requested, cancelling")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+
+ # Initialize Anthropic client for Vertex AI
+ client = AnthropicVertex(
+ project_id=project_id,
+ region=region
+ )
+
+ # Convert messages to Anthropic format
+ anthropic_messages = []
+ system_prompt = ""
+
+ for msg in messages:
+ if msg['role'] == 'system':
+ system_prompt = msg['content']
+ else:
+ anthropic_messages.append({
+ "role": msg['role'],
+ "content": msg['content']
+ })
+
+ # Create message with Anthropic client
+ kwargs = {
+ "model": model_name,
+ "messages": anthropic_messages,
+ "max_tokens": max_tokens or 4096,
+ "temperature": temperature,
+ }
+
+ if system_prompt:
+ kwargs["system"] = system_prompt
+
+ if stop_sequences:
+ kwargs["stop_sequences"] = stop_sequences
+
+ # CHECK STOP FLAG BEFORE API CALL
+ if is_stop_requested():
+ logger.info("Stop requested, cancelling API call")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+
+ try:
+ message = client.messages.create(**kwargs)
+
+ except httpx.HTTPStatusError as e:
+ # Handle HTTP status errors from the Anthropic SDK
+ status_code = e.response.status_code if hasattr(e.response, 'status_code') else 0
+ error_body = e.response.text if hasattr(e.response, 'text') else str(e)
+
+ # Check if it's an HTML error page
+ if '' in error_body or '' in error_str or '' in error_str or ' 200:
+ raise UnifiedClientError(f"Vertex AI error: Request failed. Check your region and model name.")
+ else:
+ raise UnifiedClientError(f"Vertex AI error: {error_str}")
+
+ # CHECK STOP FLAG AFTER RESPONSE
+ if is_stop_requested():
+ logger.info("Stop requested after response, discarding result")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+ # Success! Convert response to UnifiedResponse
+ print(f"Successfully got response from {region}")
+ return UnifiedResponse(
+ content=message.content[0].text if message.content else "",
+ usage={
+ "input_tokens": message.usage.input_tokens,
+ "output_tokens": message.usage.output_tokens,
+ "total_tokens": message.usage.input_tokens + message.usage.output_tokens
+ } if hasattr(message, 'usage') else None,
+ finish_reason=message.stop_reason if hasattr(message, 'stop_reason') else 'stop',
+ raw_response=message
+ )
+
+ else:
+ # For Gemini models on Vertex AI, we need to use Vertex AI SDK
+ location = os.getenv('VERTEX_AI_LOCATION', 'us-east5')
+
+ # Check stop flag before Gemini call
+ if is_stop_requested():
+ logger.info("Stop requested, cancelling Vertex AI Gemini request")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+ # Initialize Vertex AI
+ vertexai.init(project=project_id, location=location, credentials=credentials)
+
+ # Import GenerativeModel from vertexai
+ from vertexai.generative_models import GenerativeModel, GenerationConfig, HarmCategory, HarmBlockThreshold
+
+ # Create model instance
+ vertex_model = GenerativeModel(model_name)
+
+ # Format messages for Vertex AI Gemini using existing formatter
+ formatted_prompt = self._format_prompt(messages, style='gemini')
+
+ # Check if safety settings are disabled via config (from GUI)
+ disable_safety = os.getenv("DISABLE_GEMINI_SAFETY", "false").lower() == "true"
+
+ # Get thinking budget from environment (though Vertex AI may not support it)
+ thinking_budget = int(os.getenv("THINKING_BUDGET", "-1"))
+ enable_thinking = os.getenv("ENABLE_GEMINI_THINKING", "0") == "1"
+
+ # Log configuration
+ print(f"\n🔧 Vertex AI Gemini Configuration:")
+ print(f" Model: {model_name}")
+ print(f" Region: {location}")
+ print(f" Project: {project_id}")
+
+ # Configure generation parameters using passed parameters
+ generation_config_dict = {
+ "temperature": temperature,
+ "max_output_tokens": max_tokens or 8192,
+ }
+
+ # Add user-configured anti-duplicate parameters if enabled
+ if os.getenv("ENABLE_ANTI_DUPLICATE", "0") == "1":
+ # Get all anti-duplicate parameters from environment
+ if os.getenv("TOP_P"):
+ top_p = float(os.getenv("TOP_P", "1.0"))
+ if top_p < 1.0: # Only add if not default
+ generation_config_dict["top_p"] = top_p
+
+ if os.getenv("TOP_K"):
+ top_k = int(os.getenv("TOP_K", "0"))
+ if top_k > 0: # Only add if not default
+ generation_config_dict["top_k"] = top_k
+
+ # Note: Vertex AI Gemini may not support all parameters like frequency_penalty
+ # Add only supported parameters
+ if os.getenv("CANDIDATE_COUNT"):
+ candidate_count = int(os.getenv("CANDIDATE_COUNT", "1"))
+ if candidate_count > 1:
+ generation_config_dict["candidate_count"] = candidate_count
+
+ # Add custom stop sequences if provided
+ custom_stops = os.getenv("CUSTOM_STOP_SEQUENCES", "").strip()
+ if custom_stops:
+ additional_stops = [s.strip() for s in custom_stops.split(",") if s.strip()]
+ if stop_sequences:
+ stop_sequences.extend(additional_stops)
+ else:
+ stop_sequences = additional_stops
+
+ if stop_sequences:
+ generation_config_dict["stop_sequences"] = stop_sequences
+
+ # Create generation config
+ generation_config = GenerationConfig(**generation_config_dict)
+
+ # Configure safety settings based on GUI toggle
+ safety_settings = None
+ if disable_safety:
+ # Import SafetySetting from vertexai
+ from vertexai.generative_models import SafetySetting
+
+ # Create list of SafetySetting objects (same format as regular Gemini)
+ safety_settings = [
+ SafetySetting(
+ category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,
+ threshold=HarmBlockThreshold.BLOCK_NONE
+ ),
+ SafetySetting(
+ category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
+ threshold=HarmBlockThreshold.BLOCK_NONE
+ ),
+ SafetySetting(
+ category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
+ threshold=HarmBlockThreshold.BLOCK_NONE
+ ),
+ SafetySetting(
+ category=HarmCategory.HARM_CATEGORY_HARASSMENT,
+ threshold=HarmBlockThreshold.BLOCK_NONE
+ ),
+ SafetySetting(
+ category=HarmCategory.HARM_CATEGORY_CIVIC_INTEGRITY,
+ threshold=HarmBlockThreshold.BLOCK_NONE
+ ),
+ ]
+ # Only log if not stopping
+ if not self._is_stop_requested():
+ print(f"🔒 Vertex AI Gemini Safety Status: DISABLED - All categories set to BLOCK_NONE")
+ else:
+ # Only log if not stopping
+ if not self._is_stop_requested():
+ print(f"🔒 Vertex AI Gemini Safety Status: ENABLED - Using default Gemini safety settings")
+
+ # SAVE SAFETY CONFIGURATION FOR VERIFICATION
+ if safety_settings:
+ safety_status = "DISABLED - All categories set to BLOCK_NONE"
+ readable_safety = {
+ "HATE_SPEECH": "BLOCK_NONE",
+ "SEXUALLY_EXPLICIT": "BLOCK_NONE",
+ "HARASSMENT": "BLOCK_NONE",
+ "DANGEROUS_CONTENT": "BLOCK_NONE",
+ "CIVIC_INTEGRITY": "BLOCK_NONE"
+ }
+ else:
+ safety_status = "ENABLED - Using default Gemini safety settings"
+ readable_safety = "DEFAULT"
+
+ # Save configuration to file
+ config_data = {
+ "type": "VERTEX_AI_GEMINI_REQUEST",
+ "model": model_name,
+ "project_id": project_id,
+ "location": location,
+ "safety_enabled": not disable_safety,
+ "safety_settings": readable_safety,
+ "temperature": temperature,
+ "max_output_tokens": max_tokens or 8192,
+ "timestamp": datetime.now().isoformat(),
+ }
+
+ # Save configuration to file with thread isolation
+ self._save_gemini_safety_config(config_data, response_name)
+
+ # Retry logic
+ attempts = self._get_max_retries()
+ attempt = 0
+ result_text = ""
+
+ while attempt < attempts and not result_text:
+ try:
+ # Update max_output_tokens for this attempt
+ generation_config_dict["max_output_tokens"] = max_tokens or 8192
+ generation_config = GenerationConfig(**generation_config_dict)
+
+ # Only log if not stopping
+ if not self._is_stop_requested():
+ print(f" 📊 Temperature: {temperature}, Max tokens: {max_tokens or 8192}")
+
+ # Generate content with optional safety settings
+ if safety_settings:
+ response = vertex_model.generate_content(
+ formatted_prompt,
+ generation_config=generation_config,
+ safety_settings=safety_settings
+ )
+ else:
+ response = vertex_model.generate_content(
+ formatted_prompt,
+ generation_config=generation_config
+ )
+
+ # Extract text from response
+ if response.candidates:
+ for candidate in response.candidates:
+ if candidate.content and candidate.content.parts:
+ for part in candidate.content.parts:
+ if hasattr(part, 'text'):
+ result_text += part.text
+
+ # Check if we got content
+ if result_text and result_text.strip():
+ break
+ else:
+ raise Exception("Empty response from Vertex AI")
+
+ except Exception as e:
+ print(f"Vertex AI Gemini attempt {attempt+1} failed: {e}")
+
+ # Check for quota errors
+ error_str = str(e)
+ if "429" in error_str or "RESOURCE_EXHAUSTED" in error_str:
+ raise UnifiedClientError(
+ f"Quota exceeded for Vertex AI Gemini model: {model_name}\n\n"
+ "Request quota increase in Google Cloud Console."
+ )
+ elif "404" in error_str or "NOT_FOUND" in error_str:
+ raise UnifiedClientError(
+ f"Model {model_name} not found in region {location}.\n\n"
+ "Available Gemini models on Vertex AI:\n"
+ "• gemini-1.5-flash-002\n"
+ "• gemini-1.5-pro-002\n"
+ "• gemini-1.0-pro-002"
+ )
+
+ # No automatic retry - let higher level handle retries
+ #attempt += 1
+ #if attempt < attempts:
+ # print(f"❌ Gemini attempt {attempt} failed, no automatic retry")
+ # break # Exit the retry loop
+
+ # Check stop flag after response
+ if is_stop_requested():
+ logger.info("Stop requested after Vertex AI Gemini response")
+ raise UnifiedClientError("Operation cancelled by user", error_type="cancelled")
+
+ if not result_text:
+ raise UnifiedClientError("All Vertex AI Gemini attempts failed to produce content")
+
+ return UnifiedResponse(
+ content=result_text,
+ finish_reason='stop',
+ raw_response=response
+ )
+
+ except UnifiedClientError:
+ # Re-raise our own errors without modification
+ raise
+ except Exception as e:
+ # Handle any other unexpected errors
+ error_str = str(e)
+ # Don't print HTML errors
+ if '' not in error_str and ' int:
+ """Parse Retry-After header (seconds or HTTP-date) into seconds."""
+ if not value:
+ return 0
+ value = value.strip()
+ if value.isdigit():
+ try:
+ return max(0, int(value))
+ except Exception:
+ return 0
+ try:
+ import email.utils
+ dt = email.utils.parsedate_to_datetime(value)
+ if dt is None:
+ return 0
+ now = datetime.now(dt.tzinfo)
+ secs = int((dt - now).total_seconds())
+ return max(0, secs)
+ except Exception:
+ return 0
+
+ def _get_session(self, base_url: str):
+ """Get or create a thread-local requests.Session for a base_url with connection pooling."""
+ tls = self._get_thread_local_client()
+ if not hasattr(tls, "session_map"):
+ tls.session_map = {}
+ session = tls.session_map.get(base_url)
+ if session is None:
+ session = requests.Session()
+ try:
+ adapter = HTTPAdapter(
+ pool_connections=int(os.getenv("HTTP_POOL_CONNECTIONS", "20")),
+ pool_maxsize=int(os.getenv("HTTP_POOL_MAXSIZE", "50")),
+ max_retries=Retry(total=0) if Retry is not None else 0
+ )
+ except Exception:
+ adapter = HTTPAdapter(
+ pool_connections=int(os.getenv("HTTP_POOL_CONNECTIONS", "20")),
+ pool_maxsize=int(os.getenv("HTTP_POOL_MAXSIZE", "50"))
+ )
+ session.mount("http://", adapter)
+ session.mount("https://", adapter)
+ tls.session_map[base_url] = session
+ return session
+
+ def _http_request_with_retries(self, method: str, url: str, headers: dict = None, json: dict = None,
+ expected_status: tuple = (200,), max_retries: int = 3,
+ provider_name: str = None, use_session: bool = False):
+ """
+ Generic HTTP requester with standardized retry behavior.
+ - Handles cancellation, rate limits (429 with Retry-After), 5xx with backoff, and generic errors.
+ - Returns the requests.Response object when a successful status is received.
+ """
+ api_delay = self._get_send_interval()
+ provider = provider_name or "HTTP"
+ for attempt in range(max_retries):
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled")
+
+ # Toggle to ignore server-provided Retry-After headers
+ ignore_retry_after = (os.getenv("ENABLE_HTTP_TUNING", "0") == "1") and (os.getenv("IGNORE_RETRY_AFTER", "0") == "1")
+ try:
+ if use_session:
+ # Reuse pooled session based on the base URL
+ try:
+ from urllib.parse import urlsplit
+ except Exception:
+ urlsplit = None
+ base_for_session = None
+ if urlsplit is not None:
+ parts = urlsplit(url)
+ base_for_session = f"{parts.scheme}://{parts.netloc}" if parts.scheme and parts.netloc else None
+ session = self._get_session(base_for_session) if base_for_session else requests
+ timeout = self._get_timeouts() if base_for_session else self.request_timeout
+ resp = session.request(method, url, headers=headers, json=json, timeout=timeout)
+ else:
+ resp = requests.request(method, url, headers=headers, json=json, timeout=self.request_timeout)
+ except requests.RequestException as e:
+ if attempt < max_retries - 1:
+ print(f"{provider} network error (attempt {attempt + 1}): {e}")
+ time.sleep(api_delay)
+ continue
+ raise UnifiedClientError(f"{provider} network error: {e}")
+
+ status = resp.status_code
+ if status in expected_status:
+ return resp
+
+ # Rate limit handling (429)
+ if status == 429:
+ # Print detailed 429 info (match SDK-level detail)
+ try:
+ ct = (resp.headers.get('content-type') or '').lower()
+ retry_after_val = resp.headers.get('Retry-After', '')
+ rl_remaining = resp.headers.get('X-RateLimit-Remaining') or resp.headers.get('x-ratelimit-remaining')
+ rl_limit = resp.headers.get('X-RateLimit-Limit') or resp.headers.get('x-ratelimit-limit')
+ rl_reset = resp.headers.get('X-RateLimit-Reset') or resp.headers.get('x-ratelimit-reset')
+ detail_msg = None
+ if 'application/json' in ct:
+ try:
+ body = resp.json()
+ if isinstance(body, dict):
+ err = body.get('error') or {}
+ detail_msg = err.get('message') or body.get('message') or None
+ err_code = err.get('code') or body.get('code') or None
+ if detail_msg:
+ print(f"{provider} 429: {detail_msg} | code: {err_code} | retry-after: {retry_after_val} | remaining: {rl_remaining} reset: {rl_reset} limit: {rl_limit}")
+ else:
+ print(f"{provider} 429: {resp.text[:1200]} | retry-after: {retry_after_val} | remaining: {rl_remaining} reset: {rl_reset} limit: {rl_limit}")
+ except Exception:
+ print(f"{provider} 429 (non-JSON parse): ct={ct} retry-after: {retry_after_val} | remaining: {rl_remaining} reset: {rl_reset} limit: {rl_limit}")
+ else:
+ print(f"{provider} 429: ct={ct} retry-after: {retry_after_val} | remaining: {rl_remaining} reset: {rl_reset} limit: {rl_limit}")
+ except Exception:
+ pass
+
+ # Check if indefinite rate limit retry is enabled
+ indefinite_retry_enabled = os.getenv("INDEFINITE_RATE_LIMIT_RETRY", "1") == "1"
+
+ retry_after_val = resp.headers.get('Retry-After', '')
+ retry_secs = self._parse_retry_after(retry_after_val)
+ if ignore_retry_after:
+ wait_time = api_delay * 10
+ else:
+ wait_time = retry_secs if retry_secs > 0 else api_delay * 10
+
+ # Add jitter and cap wait time
+ wait_time = min(wait_time + random.uniform(1, 5), 300) # Max 5 minutes
+
+ if indefinite_retry_enabled:
+ # For indefinite retry, don't count against max_retries
+ print(f"{provider} rate limit ({status}), indefinite retry enabled - waiting {wait_time:.1f}s")
+ waited = 0.0
+ while waited < wait_time:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+ time.sleep(0.5)
+ waited += 0.5
+ # Don't increment attempt counter for rate limits when indefinite retry is enabled
+ attempt = max(0, attempt - 1)
+ continue
+ elif attempt < max_retries - 1:
+ # Standard retry behavior when indefinite retry is disabled
+ print(f"{provider} rate limit ({status}), waiting {wait_time:.1f}s (attempt {attempt + 1}/{max_retries})")
+ waited = 0.0
+ while waited < wait_time:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+ time.sleep(0.5)
+ waited += 0.5
+ continue
+
+ # If we reach here, indefinite retry is disabled and we've exhausted max_retries
+ raise UnifiedClientError(f"{provider} rate limit: {resp.text}", error_type="rate_limit", http_status=429)
+
+ # Transient server errors with optional Retry-After
+ if status in (500, 502, 503, 504) and attempt < max_retries - 1:
+ retry_after_val = resp.headers.get('Retry-After', '')
+ retry_secs = self._parse_retry_after(retry_after_val)
+ if ignore_retry_after:
+ retry_secs = 0
+ if retry_secs:
+ sleep_for = retry_secs + random.uniform(0, 1)
+ else:
+ base_delay = 5.0
+ sleep_for = min(base_delay * (2 ** attempt) + random.uniform(0, 1), 60.0)
+ print(f"{provider} {status} - retrying in {sleep_for:.1f}s (attempt {attempt + 1}/{max_retries})")
+ waited = 0.0
+ while waited < sleep_for:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+ time.sleep(0.5)
+ waited += 0.5
+ continue
+
+ # Other non-success statuses
+ if attempt < max_retries - 1:
+ print(f"{provider} API error: {status} - {resp.text} (attempt {attempt + 1})")
+ time.sleep(api_delay)
+ continue
+ raise UnifiedClientError(f"{provider} API error: {status} - {resp.text}", http_status=status)
+
+ def _extract_openai_json(self, json_resp: dict):
+ """Extract content, finish_reason, and usage from OpenAI-compatible JSON."""
+ content = ""
+ finish_reason = 'stop'
+ choices = json_resp.get('choices', [])
+ if choices:
+ choice = choices[0]
+ finish_reason = choice.get('finish_reason') or 'stop'
+ message = choice.get('message')
+ if isinstance(message, dict):
+ content = message.get('content') or message.get('text') or ""
+ elif isinstance(message, str):
+ content = message
+ else:
+ # As a fallback, try 'text' field directly on choice
+ content = choice.get('text', "")
+ # Normalize finish reasons
+ if finish_reason in ['max_tokens', 'max_length']:
+ finish_reason = 'length'
+ usage = None
+ if 'usage' in json_resp:
+ u = json_resp['usage'] or {}
+ pt = u.get('prompt_tokens', 0)
+ ct = u.get('completion_tokens', 0)
+ tt = u.get('total_tokens', pt + ct)
+ usage = {'prompt_tokens': pt, 'completion_tokens': ct, 'total_tokens': tt}
+ return content, finish_reason, usage
+
+ def _with_sdk_retries(self, provider_name: str, max_retries: int, call):
+ """Run an SDK call with standardized retry behavior and error wrapping."""
+ api_delay = self._get_send_interval()
+ for attempt in range(max_retries):
+ try:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+ return call()
+ except UnifiedClientError:
+ # Already normalized; propagate
+ raise
+ except Exception as e:
+ # Suppress noise if we are stopping/cleaning up or the SDK surfaced a cancellation
+ err_str = str(e)
+ is_cancel = getattr(self, '_cancelled', False) or ('cancelled' in err_str.lower()) or ('canceled' in err_str.lower())
+ if is_cancel:
+ # Normalize and stop retry/printing
+ raise UnifiedClientError("Operation cancelled", error_type="cancelled")
+ if attempt < max_retries - 1:
+ self._debug_log(f"{provider_name} SDK error (attempt {attempt + 1}): {e}")
+ time.sleep(api_delay)
+ continue
+ self._debug_log(f"{provider_name} SDK error after all retries: {e}")
+ raise UnifiedClientError(f"{provider_name} SDK error: {e}")
+
+ def _build_openai_headers(self, provider: str, api_key: str, headers: Optional[dict]) -> dict:
+ """Construct standard headers for OpenAI-compatible HTTP calls without altering behavior."""
+ h = dict(headers) if headers else {}
+ # Only set Authorization if not already present and not Azure special-case (Azure handled earlier)
+ if 'Authorization' not in h and provider not in ('azure',):
+ h['Authorization'] = f'Bearer {api_key}'
+ # Ensure content type
+ if 'Content-Type' not in h:
+ h['Content-Type'] = 'application/json'
+ # Ensure we explicitly request JSON back from providers
+ if 'Accept' not in h:
+ h['Accept'] = 'application/json'
+ return h
+
+ def _apply_openai_safety(self, provider: str, disable_safety: bool, payload: dict, headers: dict):
+ """Apply safety flags for providers that support them (avoid unsupported params)."""
+ if not disable_safety:
+ return
+ # Do NOT send 'moderation' to OpenAI; it's unsupported and causes 400 Unknown parameter
+ if provider in ["groq", "fireworks"]:
+ payload["moderation"] = False
+ elif provider == "poe":
+ payload["safe_mode"] = False
+ elif provider == "openrouter":
+ headers['X-Safe-Mode'] = 'false'
+
+ def _build_anthropic_payload(self, formatted_messages: list, temperature: float, max_tokens: int, anti_dupe_params: dict, system_message: Optional[str] = None) -> dict:
+ data = {
+ "model": self.model,
+ "messages": formatted_messages,
+ "temperature": temperature,
+ "max_tokens": max_tokens,
+ **(anti_dupe_params or {})
+ }
+ if system_message:
+ data["system"] = system_message
+ return data
+
+ def _parse_anthropic_json(self, json_resp: dict):
+ content_parts = json_resp.get("content", [])
+ if isinstance(content_parts, list):
+ content = "".join(part.get("text", "") for part in content_parts)
+ else:
+ content = str(content_parts)
+ finish_reason = json_resp.get("stop_reason")
+ if finish_reason == "max_tokens":
+ finish_reason = "length"
+ elif finish_reason == "stop_sequence":
+ finish_reason = "stop"
+ usage = json_resp.get("usage")
+ if usage:
+ usage = {
+ 'prompt_tokens': usage.get('input_tokens', 0),
+ 'completion_tokens': usage.get('output_tokens', 0),
+ 'total_tokens': usage.get('input_tokens', 0) + usage.get('output_tokens', 0)
+ }
+ else:
+ usage = None
+ return content, finish_reason, usage
+
+ def _get_idempotency_key(self) -> str:
+ """Build an idempotency key from the current request context."""
+ tls = self._get_thread_local_client()
+ req_id = getattr(tls, "idem_request_id", None) or uuid.uuid4().hex[:8]
+ attempt = getattr(tls, "idem_attempt", 0)
+ return f"{req_id}-a{attempt}"
+
+ def _get_openai_client(self, base_url: str, api_key: str):
+ """Get or create a thread-local OpenAI client for a base_url."""
+ if openai is None:
+ raise UnifiedClientError("OpenAI SDK not installed. Install with: pip install openai")
+
+ # CRITICAL: If individual endpoint is applied, use our existing client instead of creating new one
+ if (hasattr(self, '_individual_endpoint_applied') and self._individual_endpoint_applied and
+ hasattr(self, 'openai_client') and self.openai_client):
+ return self.openai_client
+
+ tls = self._get_thread_local_client()
+ if not hasattr(tls, "openai_clients"):
+ tls.openai_clients = {}
+ map_key = f"{base_url}|{bool(api_key)}"
+ client = tls.openai_clients.get(map_key)
+ if client is None:
+ timeout_obj = None
+ try:
+ if httpx is not None:
+ connect, read = self._get_timeouts()
+ timeout_obj = httpx.Timeout(connect=connect, read=read, write=read, pool=connect)
+ else:
+ # Fallback: use read timeout as a single float
+ _, read = self._get_timeouts()
+ timeout_obj = float(read)
+ except Exception:
+ _, read = self._get_timeouts()
+ timeout_obj = float(read)
+ client = openai.OpenAI(
+ api_key=api_key,
+ base_url=base_url,
+ timeout=timeout_obj
+ )
+ tls.openai_clients[map_key] = client
+ return client
+
+ def _get_response(self, messages, temperature, max_tokens, max_completion_tokens, response_name) -> UnifiedResponse:
+ """
+ Route to appropriate AI provider and get response
+
+ Args:
+ messages: List of message dicts
+ temperature: Sampling temperature
+ max_tokens: Maximum tokens (for non-o series models)
+ max_completion_tokens: Maximum completion tokens (for o-series models)
+ response_name: Name for saving response
+ """
+ self._apply_api_call_stagger()
+ # Ensure client_type is initialized before routing (important for multi-key mode)
+ try:
+ if not hasattr(self, 'client_type') or self.client_type is None:
+ self._ensure_thread_client()
+ except Exception:
+ # Guard against missing attribute in extreme early paths
+ if not hasattr(self, 'client_type'):
+ self.client_type = None
+
+ # FIX: Ensure max_tokens has a value before passing to handlers
+ if max_tokens is None and max_completion_tokens is None:
+ # Use instance default or standard default
+ max_tokens = getattr(self, 'max_tokens', 8192)
+ elif max_tokens is None and max_completion_tokens is not None:
+ # For o-series models, use max_completion_tokens as fallback
+ max_tokens = max_completion_tokens
+ # Check if this is actually Gemini (including when using OpenAI endpoint)
+ actual_provider = self._get_actual_provider()
+
+ # Detect if this is an image request (messages contain image parts)
+ has_images = False
+ for _m in messages:
+ c = _m.get('content')
+ if isinstance(c, list) and any(isinstance(p, dict) and p.get('type') == 'image_url' for p in c):
+ has_images = True
+ break
+
+ # If image request, route to image handlers only for providers that require it
+ if has_images:
+ img_b64 = self._extract_first_image_base64(messages)
+ if actual_provider == 'gemini':
+ return self._send_gemini(messages, temperature, max_tokens or max_completion_tokens, response_name, image_base64=img_b64)
+ if actual_provider == 'anthropic':
+ return self._send_anthropic_image(messages, img_b64, temperature, max_tokens or max_completion_tokens, response_name)
+ if actual_provider == 'vertex_model_garden':
+ return self._send_vertex_model_garden_image(messages, img_b64, temperature, max_tokens or max_completion_tokens, response_name)
+ if actual_provider == 'poe':
+ return self._send_poe_image(messages, img_b64, temperature, max_tokens or max_completion_tokens, response_name)
+ # Otherwise fall through to default handler below (OpenAI-compatible providers handle images in messages)
+
+ # Map client types to their handler methods
+ handlers = {
+ 'openai': self._send_openai,
+ 'gemini': self._send_gemini,
+ 'deepseek': self._send_openai_provider_router, # Consolidated
+ 'anthropic': self._send_anthropic,
+ 'mistral': self._send_mistral,
+ 'cohere': self._send_cohere,
+ 'chutes': self._send_openai_provider_router, # Consolidated
+ 'ai21': self._send_ai21,
+ 'together': self._send_openai_provider_router, # Already in router
+ 'perplexity': self._send_openai_provider_router, # Consolidated
+ 'replicate': self._send_replicate,
+ 'yi': self._send_openai_provider_router,
+ 'qwen': self._send_openai_provider_router,
+ 'baichuan': self._send_openai_provider_router,
+ 'zhipu': self._send_openai_provider_router,
+ 'moonshot': self._send_openai_provider_router,
+ 'groq': self._send_openai_provider_router,
+ 'baidu': self._send_openai_provider_router,
+ 'tencent': self._send_openai_provider_router,
+ 'iflytek': self._send_openai_provider_router,
+ 'bytedance': self._send_openai_provider_router,
+ 'minimax': self._send_openai_provider_router,
+ 'sensenova': self._send_openai_provider_router,
+ 'internlm': self._send_openai_provider_router,
+ 'tii': self._send_openai_provider_router,
+ 'microsoft': self._send_openai_provider_router,
+ 'azure': self._send_azure,
+ 'google': self._send_google_palm,
+ 'alephalpha': self._send_alephalpha,
+ 'databricks': self._send_openai_provider_router,
+ 'huggingface': self._send_huggingface,
+ 'openrouter': self._send_openai_provider_router, # OpenRouter aggregator
+ 'poe': self._send_poe, # POE platform (restored)
+ 'electronhub': self._send_electronhub, # ElectronHub aggregator (restored)
+ 'fireworks': self._send_openai_provider_router,
+ 'xai': self._send_openai_provider_router, # xAI Grok models
+ 'salesforce': self._send_openai_provider_router, # Consolidated
+ 'vertex_model_garden': self._send_vertex_model_garden,
+ 'deepl': self._send_deepl, # DeepL translation service
+ 'google_translate': self._send_google_translate, # Google Cloud Translate
+ }
+
+ # IMPORTANT: Use actual_provider for routing, not client_type
+ # This ensures Gemini always uses its native handler even when using OpenAI endpoint
+ handler = handlers.get(actual_provider)
+
+ if not handler:
+ # Fallback to client_type if no actual_provider match
+ handler = handlers.get(self.client_type)
+
+ if not handler:
+ # Try fallback to Together AI for open models
+ if self.client_type in ['bigscience', 'meta']:
+ logger.info(f"Using Together AI for {self.client_type} model")
+ return self._send_openai_provider_router(messages, temperature, max_tokens, response_name)
+ raise UnifiedClientError(f"No handler for client type: {getattr(self, 'client_type', 'unknown')}")
+
+ if self.client_type in ['deepl', 'google_translate']:
+ # These services don't use temperature or token limits
+ # They just translate the text directly
+ return handler(messages, None, None, response_name)
+
+ # Route based on actual provider (handles Gemini with OpenAI endpoint correctly)
+ elif actual_provider == 'gemini':
+ # Always use Gemini handler for Gemini models, regardless of transport
+ logger.debug(f"Routing to Gemini handler (actual provider: {actual_provider}, client_type: {self.client_type})")
+ return self._send_gemini(messages, temperature, max_tokens, response_name)
+ elif actual_provider == 'openai' or self.client_type == 'openai':
+ # For OpenAI, pass the max_completion_tokens parameter
+ return handler(messages, temperature, max_tokens, max_completion_tokens, response_name)
+ elif self.client_type == 'vertex_model_garden':
+ # Vertex AI doesn't use response_name parameter
+ return handler(messages, temperature, max_tokens or max_completion_tokens, None, response_name)
+ else:
+ # Other providers don't use max_completion_tokens
+ return handler(messages, temperature, max_tokens, response_name)
+
+
+ def _get_actual_provider(self) -> str:
+ """
+ Get the actual provider name, accounting for Gemini using OpenAI endpoint.
+ This is used for proper routing and detection.
+ """
+ # Check if this is Gemini using OpenAI endpoint
+ if hasattr(self, '_original_client_type') and self._original_client_type:
+ return self._original_client_type
+ return getattr(self, 'client_type', 'openai')
+
+ def _extract_chapter_label(self, messages) -> str:
+ """Extract a concise chapter/chunk label from messages for logging."""
+ try:
+ s = str(messages)
+ import re
+ chap = None
+ m = re.search(r'Chapter\s+(\d+)', s)
+ if m:
+ chap = f"Chapter {m.group(1)}"
+ chunk = None
+ mc = re.search(r'Chunk\s+(\d+)/(\d+)', s)
+ if mc:
+ chunk = f"{mc.group(1)}/{mc.group(2)}"
+ if chap and chunk:
+ return f"{chap} (chunk {chunk})"
+ if chap:
+ return chap
+ if chunk:
+ return f"chunk {chunk}"
+ except Exception:
+ pass
+ return "request"
+
+ def _log_pre_stagger(self, messages, context: Optional[str] = None) -> None:
+ """Emit a pre-stagger log line so users see what's being sent before delay."""
+ try:
+ thread_name = threading.current_thread().name
+ label = self._extract_chapter_label(messages)
+ ctx = context or 'translation'
+ print(f"📤 [{thread_name}] Sending {label} ({ctx}) — queuing staggered API call...")
+ # Stash label so stagger logger can show what is being translated
+ try:
+ tls = self._get_thread_local_client()
+ tls.current_request_label = label
+ except Exception:
+ pass
+ except Exception:
+ # Never block on logging
+ pass
+
+ def _is_gemini_request(self) -> bool:
+ """
+ Check if this is a Gemini request (native or via OpenAI endpoint)
+ """
+ return self._get_actual_provider() == 'gemini'
+
+ def _is_stop_requested(self) -> bool:
+ """
+ Check if stop was requested by checking global flag, local cancelled flag, and class-level cancellation
+ """
+ # Check class-level global cancellation first
+ if self.is_globally_cancelled():
+ return True
+
+ # Check local cancelled flag (more reliable in threading context)
+ if getattr(self, '_cancelled', False):
+ return True
+
+ try:
+ # Import the stop check function from the main translation module
+ from TransateKRtoEN import is_stop_requested
+ return is_stop_requested()
+ except ImportError:
+ # Fallback if import fails
+ return False
+
+ def _get_anti_duplicate_params(self, temperature):
+ """Get user-configured anti-duplicate parameters from GUI settings"""
+ # Check if user enabled anti-duplicate
+ if os.getenv("ENABLE_ANTI_DUPLICATE", "0") != "1":
+ return {}
+
+ # Get user's exact values from GUI (via environment variables)
+ top_p = float(os.getenv("TOP_P", "1.0"))
+ top_k = int(os.getenv("TOP_K", "0"))
+ frequency_penalty = float(os.getenv("FREQUENCY_PENALTY", "0.0"))
+ presence_penalty = float(os.getenv("PRESENCE_PENALTY", "0.0"))
+
+ # Apply parameters based on provider capabilities
+ params = {}
+
+ if self.client_type in ['openai', 'deepseek', 'groq', 'electronhub', 'openrouter']:
+ # OpenAI-compatible providers
+ if frequency_penalty > 0:
+ params["frequency_penalty"] = frequency_penalty
+ if presence_penalty > 0:
+ params["presence_penalty"] = presence_penalty
+ if top_p < 1.0:
+ params["top_p"] = top_p
+
+ elif self.client_type == 'gemini':
+ # Gemini supports both top_p and top_k
+ if top_p < 1.0:
+ params["top_p"] = top_p
+ if top_k > 0:
+ params["top_k"] = top_k
+
+ elif self.client_type == 'anthropic':
+ # Claude supports top_p and top_k
+ if top_p < 1.0:
+ params["top_p"] = top_p
+ if top_k > 0:
+ params["top_k"] = top_k
+
+ # Log applied parameters
+ if params:
+ logger.info(f"Applying anti-duplicate params for {self.client_type}: {list(params.keys())}")
+
+ return params
+
+ def _detect_silent_truncation(self, content: str, messages: List[Dict], context: str = None) -> bool:
+ """
+ Detect silent truncation where APIs (especially ElectronHub) cut off content
+ without setting proper finish_reason.
+
+ Common patterns:
+ - Sentences ending abruptly without punctuation
+ - Content significantly shorter than expected
+ - Missing closing tags in structured content
+ - Sudden topic changes or incomplete thoughts
+ """
+ if not content:
+ return False
+
+ content_stripped = content.strip()
+ if not content_stripped:
+ return False
+
+ # Pattern 1: Check for incomplete sentence endings (with improved logic)
+ # Skip this check for code contexts, JSON, or when content contains code blocks
+ if context not in ['code', 'json', 'data', 'list', 'python', 'javascript', 'programming']:
+ # Also skip if content appears to contain code
+ if '```' in content or 'def ' in content or 'class ' in content or 'import ' in content or 'function ' in content:
+ pass # Skip punctuation check for code content
+ else:
+ last_char = content_stripped[-1]
+ # Valid endings for PROSE/NARRATIVE text only
+ # Removed quotes since they're common in code
+ valid_endings = [
+ ".", "!", "?", "»", "】", ")", ")",
+ "。", "!", "?", ":", ";", "]", "}",
+ "…", "—", "–", "*", "/", ">", "~", "%"
+ ]
+
+ # Check if ends with incomplete sentence (no proper punctuation)
+ if last_char not in valid_endings:
+ # Look at the last few characters for better context
+ last_segment = content_stripped[-50:] if len(content_stripped) > 50 else content_stripped
+
+ # Check for common false positive patterns
+ false_positive_patterns = [
+ # Lists or enumerations often don't end with punctuation
+ r'\n\s*[-•*]\s*[^.!?]+$', # Bullet points
+ r'\n\s*\d+\)\s*[^.!?]+$', # Numbered lists
+ r'\n\s*[a-z]\)\s*[^.!?]+$', # Letter lists
+ # Code or technical content
+ r'```[^`]*$', # Inside code block
+ r'\$[^$]+$', # Math expressions
+ # Single words or short phrases (likely labels/headers)
+ r'^\w+$', # Single word
+ r'^[\w\s]{1,15}$', # Very short content
+ ]
+
+ import re
+ is_false_positive = any(re.search(pattern, last_segment) for pattern in false_positive_patterns)
+
+ if not is_false_positive:
+ # Additional check: is the last word incomplete?
+ words = content_stripped.split()
+ if words and len(words) > 3: # Only check if we have enough content
+ last_word = words[-1]
+ # Check for common incomplete patterns
+ # But exclude common abbreviations
+ common_abbreviations = {'etc', 'vs', 'eg', 'ie', 'vol', 'no', 'pg', 'ch', 'pt'}
+ if (len(last_word) > 2 and
+ last_word[-1].isalpha() and
+ last_word.lower() not in common_abbreviations and
+ not last_word.isupper()): # Exclude acronyms
+
+ # Final check: does it look like mid-sentence?
+ # Look for sentence starters before the last segment
+ preceding_text = ' '.join(words[-10:-1]) if len(words) > 10 else ' '.join(words[:-1])
+ sentence_starters = ['the', 'a', 'an', 'and', 'but', 'or', 'so', 'because', 'when', 'if', 'that']
+
+ # Check if we're likely mid-sentence
+ if any(starter in preceding_text.lower().split() for starter in sentence_starters):
+ print(f"Possible silent truncation detected: incomplete sentence ending")
+ return True
+
+ # Pattern 2: Check for significantly short responses (with improved thresholds)
+ if context == 'translation':
+ # Calculate input length more accurately
+ input_content = []
+ for msg in messages:
+ if msg.get('role') == 'user':
+ msg_content = msg.get('content', '')
+ # Handle both string and list content formats
+ if isinstance(msg_content, list):
+ for item in msg_content:
+ if isinstance(item, dict) and item.get('type') == 'text':
+ input_content.append(item.get('text', ''))
+ else:
+ input_content.append(msg_content)
+
+ input_length = sum(len(text) for text in input_content)
+
+ # Adjusted threshold - translations can legitimately be shorter
+ # Only flag if output is less than 20% of input AND input is substantial
+ if input_length > 1000 and len(content_stripped) < input_length * 0.2:
+ # Additional check: does the content seem complete?
+ if not content_stripped.endswith(('.', '!', '?', '"', "'", '。', '!', '?')):
+ print(f"Possible silent truncation: output ({len(content_stripped)} chars) much shorter than input ({input_length} chars)")
+ return True
+
+ # Pattern 3: Check for incomplete HTML/XML structures (improved)
+ if '<' in content and '>' in content:
+ # More sophisticated tag matching
+ import re
+
+ # Find all opening tags (excluding self-closing)
+ opening_tags = re.findall(r'<([a-zA-Z][^/>]*?)(?:\s[^>]*)?>',content)
+ closing_tags = re.findall(r'([a-zA-Z][^>]*?)>', content)
+ self_closing = re.findall(r'<[^>]*?/>', content)
+
+ # Count tag mismatches
+ from collections import Counter
+ open_counts = Counter(opening_tags)
+ close_counts = Counter(closing_tags)
+
+ # Check for significant mismatches
+ unclosed_tags = []
+ for tag, count in open_counts.items():
+ # Ignore void elements that don't need closing
+ void_elements = {'br', 'hr', 'img', 'input', 'meta', 'area', 'base', 'col', 'embed', 'link', 'param', 'source', 'track', 'wbr'}
+ if tag.lower() not in void_elements:
+ close_count = close_counts.get(tag, 0)
+ if count > close_count + 1: # Allow 1 tag mismatch
+ unclosed_tags.append(tag)
+
+ if len(unclosed_tags) > 2: # Multiple unclosed tags indicate truncation
+ print(f"Possible silent truncation: unclosed HTML tags detected: {unclosed_tags}")
+ return True
+
+ # Pattern 4: Check for mature content indicators (reduced false positives)
+ # Only check if the content is suspiciously short
+ if len(content_stripped) < 200:
+ mature_indicators = [
+ 'cannot provide explicit', 'cannot generate adult',
+ 'unable to create sexual', 'cannot assist with mature',
+ 'against my guidelines to create explicit'
+ ]
+ content_lower = content_stripped.lower()
+
+ for indicator in mature_indicators:
+ if indicator in content_lower:
+ # This is likely a refusal, not truncation
+ # Don't mark as truncation, let the calling code handle it
+ print(f"Content appears to be refused (contains '{indicator[:20]}...')")
+ return False # This is a refusal, not truncation
+
+ # Pattern 5: Check for incomplete code blocks
+ if '```' in content:
+ code_block_count = content.count('```')
+ if code_block_count % 2 != 0: # Odd number means unclosed
+ # Additional check: is there actual code content?
+ last_block_pos = content.rfind('```')
+ content_after_block = content[last_block_pos + 3:].strip()
+
+ # Only flag if there's substantial content after the opening ```
+ if len(content_after_block) > 10:
+ print(f"Possible silent truncation: unclosed code block")
+ return True
+
+ # Pattern 6: For glossary/JSON context, check for incomplete JSON (improved)
+ if context in ['glossary', 'json', 'data']:
+ # Try to detect JSON-like content
+ if content_stripped.startswith(('[', '{')):
+ # Check for matching brackets
+ open_brackets = content_stripped.count('[') + content_stripped.count('{')
+ close_brackets = content_stripped.count(']') + content_stripped.count('}')
+
+ if open_brackets > close_brackets:
+ # Additional validation: try to parse as JSON
+ import json
+ try:
+ json.loads(content_stripped)
+ # It's valid JSON, not truncated
+ return False
+ except json.JSONDecodeError as e:
+ # Check if the error is at the end (indicating truncation)
+ if e.pos >= len(content_stripped) - 10:
+ print(f"Possible silent truncation: incomplete JSON structure")
+ return True
+
+ # Pattern 7: Check for sudden endings in long content
+ if len(content_stripped) > 500:
+ # Look for patterns that indicate mid-thought truncation
+ last_100_chars = content_stripped[-100:]
+
+ # Check for incomplete patterns at the end
+ incomplete_patterns = [
+ r',\s*$', # Ends with comma
+ r';\s*$', # Ends with semicolon
+ r'\w+ing\s+$', # Ends with -ing word (often mid-action)
+ r'\b(and|or|but|with|for|to|in|on|at)\s*$', # Ends with conjunction/preposition
+ r'\b(the|a|an)\s*$', # Ends with article
+ ]
+
+ import re
+ for pattern in incomplete_patterns:
+ if re.search(pattern, last_100_chars, re.IGNORECASE):
+ # Double-check this isn't a false positive
+ # Look at the broader context
+ sentences = content_stripped.split('.')
+ if len(sentences) > 3: # Has multiple sentences
+ last_sentence = sentences[-1].strip()
+ if len(last_sentence) > 20: # Substantial incomplete sentence
+ print(f"Possible silent truncation: content ends mid-thought")
+ return True
+
+ return False
+
+ def _enhance_electronhub_response(self, response: UnifiedResponse, messages: List[Dict],
+ context: str = None) -> UnifiedResponse:
+ """
+ Enhance ElectronHub responses with better truncation detection and handling.
+ ElectronHub sometimes silently truncates without proper finish_reason.
+ """
+ # If already marked as truncated, no need to check further
+ if response.is_truncated:
+ return response
+
+ # Check for silent truncation
+ if self._detect_silent_truncation(response.content, messages, context):
+ print(f"Silent truncation detected for {self.model} via ElectronHub")
+
+ # Check if it's likely censorship vs length limit
+ content_lower = response.content.lower()
+ censorship_phrases = [
+ "i cannot", "i can't", "inappropriate", "unable to process",
+ "against my guidelines", "cannot assist", "not able to",
+ "i'm not able", "i am not able", "cannot provide", "can't provide"
+ ]
+
+ is_censorship = any(phrase in content_lower for phrase in censorship_phrases)
+
+ if is_censorship:
+ # This is content refusal, not truncation
+ logger.info("Detected content refusal rather than truncation")
+ response.finish_reason = 'content_filter'
+ response.error_details = {
+ 'type': 'content_refused',
+ 'provider': 'electronhub',
+ 'model': self.model,
+ 'detection': 'silent_censorship'
+ }
+ else:
+ # This is actual truncation
+ response.finish_reason = 'length' # Mark as truncated for retry logic
+ response.error_details = {
+ 'type': 'silent_truncation',
+ 'provider': 'electronhub',
+ 'model': self.model,
+ 'detection': 'pattern_analysis'
+ }
+
+ # Add warning to content for translation context
+ if context == 'translation' and not is_censorship:
+ response.content += "\n[WARNING: Response may be truncated]"
+
+ return response
+
+ def _send_electronhub(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to ElectronHub API aggregator with enhanced truncation detection
+
+ ElectronHub provides access to multiple AI models through a unified endpoint.
+ Model names should be prefixed with 'eh/', 'electronhub/', or 'electron/'.
+
+ Examples:
+ - eh/yi-34b-chat-200k
+ - electronhub/gpt-4.5
+ - electron/claude-4-opus
+
+ Note: ElectronHub uses OpenAI-compatible API format.
+ This version includes silent truncation detection for mature content.
+ """
+ # Get ElectronHub endpoint (can be overridden via environment)
+ base_url = os.getenv("ELECTRONHUB_API_URL", "https://api.electronhub.ai/v1")
+
+ # Store original model name for error messages and restoration
+ original_model = self.model
+
+ # Strip the ElectronHub prefix from the model name
+ # This is critical - ElectronHub expects the model name WITHOUT the prefix
+ actual_model = self.model
+
+ # Define prefixes to strip (in order of likelihood)
+ electronhub_prefixes = ['eh/', 'electronhub/', 'electron/']
+
+ # Strip the first matching prefix
+ for prefix in electronhub_prefixes:
+ if actual_model.startswith(prefix):
+ actual_model = actual_model[len(prefix):]
+ logger.info(f"Stripped '{prefix}' prefix from model name: '{original_model}' -> '{actual_model}'")
+ print(f"🔌 ElectronHub: Using model '{actual_model}' (stripped from '{original_model}')")
+ break
+ else:
+ # No prefix found - this shouldn't happen if routing worked correctly
+ print(f"No ElectronHub prefix found in model '{self.model}', using as-is")
+ print(f"⚠️ ElectronHub: No prefix found in '{self.model}', using as-is")
+
+ # Log the API call details
+ logger.info(f"Sending to ElectronHub API: model='{actual_model}', endpoint='{base_url}'")
+
+ # Debug: Log system prompt if present
+ for msg in messages:
+ if msg.get('role') == 'system':
+ logger.debug(f"ElectronHub - System prompt detected: {len(msg.get('content', ''))} chars")
+ print(f"📝 ElectronHub: Sending system prompt ({len(msg.get('content', ''))} characters)")
+ break
+ else:
+ print("ElectronHub - No system prompt found in messages")
+ print("⚠️ ElectronHub: No system prompt in messages")
+
+ # Check if we should warn about potentially problematic models
+ #problematic_models = ['claude', 'gpt-4', 'gpt-3.5', 'gemini']
+ #if any(model in actual_model.lower() for model in problematic_models):
+ #print(f"⚠️ ElectronHub: Model '{actual_model}' may have strict content filters")
+
+ # Check for mature content indicators
+ all_content = ' '.join(msg.get('content', '') for msg in messages).lower()
+ mature_indicators = ['mature', 'adult', 'explicit', 'sexual', 'violence', 'intimate']
+ #if any(indicator in all_content for indicator in mature_indicators):
+ #print(f"💡 ElectronHub: Consider using models like yi-34b-chat, deepseek-chat, or llama-2-70b for this content")
+
+ # Temporarily update self.model for the API call
+ # This is necessary because _send_openai_compatible uses self.model
+ self.model = actual_model
+
+ try:
+ # Make the API call using OpenAI-compatible format
+ result = self._send_openai_compatible(
+ messages, temperature, max_tokens,
+ base_url=base_url,
+ response_name=response_name,
+ provider="electronhub"
+ )
+
+ # ENHANCEMENT: Check for silent truncation/censorship
+ enhanced_result = self._enhance_electronhub_response(result, messages, self.context)
+
+ if enhanced_result.finish_reason in ['length', 'content_filter']:
+ self._log_truncation_failure(
+ messages=messages,
+ response_content=enhanced_result.content,
+ finish_reason=enhanced_result.finish_reason,
+ context=self.context,
+ error_details=enhanced_result.error_details
+ )
+
+ # Log if truncation was detected
+ if enhanced_result.finish_reason == 'length' and result.finish_reason != 'length':
+ print(f"🔍 ElectronHub: Silent truncation detected and corrected")
+ elif enhanced_result.finish_reason == 'content_filter' and result.finish_reason != 'content_filter':
+ print(f"🚫 ElectronHub: Silent content refusal detected")
+
+ return enhanced_result
+
+ except UnifiedClientError as e:
+ # Enhance error messages for common ElectronHub issues
+ error_str = str(e)
+
+ if "Invalid model" in error_str or "400" in error_str or "model not found" in error_str.lower():
+ # Provide helpful error message for invalid models
+ error_msg = (
+ f"ElectronHub rejected model '{actual_model}' (original: '{original_model}').\n"
+ f"\nCommon ElectronHub model names:\n"
+ f" • OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o, gpt-4o-mini, gpt-4.5, gpt-4.1\n"
+ f" • Anthropic: claude-3-opus, claude-3-sonnet, claude-3-haiku, claude-4-opus, claude-4-sonnet\n"
+ f" • Meta: llama-2-70b-chat, llama-2-13b-chat, llama-2-7b-chat, llama-3-70b, llama-4-70b\n"
+ f" • Mistral: mistral-large, mistral-medium, mixtral-8x7b\n"
+ f" • Google: gemini-pro, gemini-1.5-pro, gemini-2.5-pro\n"
+ f" • Yi: yi-34b-chat, yi-6b-chat\n"
+ f" • Others: deepseek-coder-33b, qwen-72b-chat, grok-3\n"
+ f"\nNote: Do not include version suffixes like ':latest' or ':safe'"
+ )
+ print(f"\n❌ {error_msg}")
+ raise UnifiedClientError(error_msg, error_type="invalid_model", details={"attempted_model": actual_model})
+
+ elif "unauthorized" in error_str.lower() or "401" in error_str:
+ error_msg = (
+ f"ElectronHub authentication failed. Please check your API key.\n"
+ f"Make sure you're using an ElectronHub API key, not a key from the underlying provider."
+ )
+ print(f"\n❌ {error_msg}")
+ raise UnifiedClientError(error_msg, error_type="auth_error")
+
+ elif "rate limit" in error_str.lower() or "429" in error_str:
+ # Preserve the original error details from OpenRouter/ElectronHub
+ # The original error should contain the full API response with specific details
+ print(f"\n⏳ ElectronHub rate limit error: {error_str}")
+ # Use the original error string to preserve the full OpenRouter error description
+ raise UnifiedClientError(error_str, error_type="rate_limit")
+
+ else:
+ # Re-raise original error with context
+ print(f"ElectronHub API error for model '{actual_model}': {e}")
+ raise
+
+ finally:
+ # Always restore the original model name
+ # This ensures subsequent calls work correctly
+ self.model = original_model
+
+ def _parse_poe_tokens(self, key_str: str) -> dict:
+ """Parse POE cookies from a single string.
+ Returns a dict that always includes 'p-b' (required) and may include 'p-lat' and any
+ other cookies present (e.g., 'cf_clearance', '__cf_bm'). Unknown cookies are forwarded as-is.
+
+ Accepted input formats:
+ - "p-b:AAA|p-lat:BBB"
+ - "p-b=AAA; p-lat=BBB"
+ - Raw cookie header with or without the "Cookie:" prefix
+ - Just the p-b value (long string) when no delimiter is present
+ """
+ import re
+ s = (key_str or "").strip()
+ if s.lower().startswith("cookie:"):
+ s = s.split(":", 1)[1].strip()
+ tokens: dict = {}
+ # Split on | ; , or newlines
+ parts = re.split(r"[|;,\n]+", s)
+ for part in parts:
+ part = part.strip()
+ if not part:
+ continue
+ if ":" in part:
+ k, v = part.split(":", 1)
+ elif "=" in part:
+ k, v = part.split("=", 1)
+ else:
+ # If no delimiter and p-b not set, treat entire string as p-b
+ if 'p-b' not in tokens and len(part) > 20:
+ tokens['p-b'] = part
+ continue
+ k = k.strip().lower()
+ v = v.strip()
+ # Normalize key names
+ if k in ("p-b", "p_b", "pb", "p.b"):
+ tokens['p-b'] = v
+ elif k in ("p-lat", "p_lat", "plat", "p.lat"):
+ tokens['p-lat'] = v
+ else:
+ # Forward any additional cookie that looks valid
+ if re.match(r"^[a-z0-9_\-\.]+$", k):
+ tokens[k] = v
+ return tokens
+
+ def _send_poe(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request using poe-api-wrapper"""
+ try:
+ from poe_api_wrapper import PoeApi
+ except ImportError:
+ raise UnifiedClientError(
+ "poe-api-wrapper not installed. Run: pip install poe-api-wrapper"
+ )
+
+ # Parse cookies using robust parser
+ tokens = self._parse_poe_tokens(self.api_key)
+ if 'p-b' not in tokens or not tokens['p-b']:
+ raise UnifiedClientError(
+ "POE tokens missing. Provide cookies as 'p-b:VALUE|p-lat:VALUE' or 'p-b=VALUE; p-lat=VALUE'",
+ error_type="auth_error"
+ )
+
+ # Some wrapper versions require p-lat present (empty is allowed but may reduce success rate)
+ if 'p-lat' not in tokens:
+ logger.info("No p-lat cookie provided; proceeding without it")
+ tokens['p-lat'] = ''
+
+ logger.info(f"Tokens being sent: p-b={len(tokens.get('p-b', ''))} chars, p-lat={len(tokens.get('p-lat', ''))} chars")
+
+ try:
+ # Create Poe client (try to pass proxy/headers if supported)
+ poe_kwargs = {}
+ ua = os.getenv("POE_USER_AGENT") or os.getenv("HTTP_USER_AGENT")
+ if ua:
+ poe_kwargs["headers"] = {"User-Agent": ua, "Referer": "https://poe.com/", "Origin": "https://poe.com"}
+ proxy = os.getenv("POE_PROXY") or os.getenv("HTTPS_PROXY") or os.getenv("HTTP_PROXY")
+ if proxy:
+ poe_kwargs["proxy"] = proxy
+ try:
+ poe_client = PoeApi(tokens=tokens, **poe_kwargs)
+ except TypeError:
+ # Older versions may not support headers/proxy kwargs
+ poe_client = PoeApi(tokens=tokens)
+ # Best-effort header update if client exposes httpx session
+ try:
+ if ua and hasattr(poe_client, "session") and hasattr(poe_client.session, "headers"):
+ poe_client.session.headers.update({"User-Agent": ua, "Referer": "https://poe.com/", "Origin": "https://poe.com"})
+ except Exception:
+ pass
+
+ # Get bot name
+ requested_model = self.model.replace('poe/', '', 1)
+ bot_map = {
+ # GPT models
+ 'gpt-4': 'beaver',
+ 'gpt-4o': 'GPT-4o',
+ 'gpt-3.5-turbo': 'chinchilla',
+
+ # Claude models
+ 'claude': 'a2',
+ 'claude-instant': 'a2',
+ 'claude-2': 'claude_2',
+ 'claude-3-opus': 'claude_3_opus',
+ 'claude-3-sonnet': 'claude_3_sonnet',
+ 'claude-3-haiku': 'claude_3_haiku',
+
+ # Gemini models
+ 'gemini-2.5-flash': 'gemini_1_5_flash',
+ 'gemini-2.5-pro': 'gemini_1_5_pro',
+ 'gemini-pro': 'gemini_pro',
+
+ # Other models
+ 'assistant': 'assistant',
+ 'web-search': 'web_search',
+ }
+ bot_name = bot_map.get(requested_model.lower(), requested_model)
+ logger.info(f"Using bot name: {bot_name}")
+
+ # Send message
+ prompt = self._messages_to_prompt(messages)
+ full_response = ""
+
+ # Handle temperature and max_tokens if supported
+ # Note: poe-api-wrapper might not support these parameters directly
+ for chunk in poe_client.send_message(bot_name, prompt):
+ if 'response' in chunk:
+ full_response = chunk['response']
+
+ # Get the final text
+ final_text = chunk.get('text', full_response) if 'chunk' in locals() else full_response
+
+ if not final_text:
+ raise UnifiedClientError("POE returned empty response", error_type="empty")
+
+ return UnifiedResponse(
+ content=final_text,
+ finish_reason="stop",
+ raw_response=chunk if 'chunk' in locals() else {"response": full_response}
+ )
+
+ except Exception as e:
+ print(f"Poe API error details: {str(e)}")
+ # Check for specific errors
+ error_str = str(e).lower()
+ if "403" in error_str or "forbidden" in error_str or "auth" in error_str or "unauthorized" in error_str:
+ raise UnifiedClientError(
+ "POE authentication failed (403). Your cookies may be invalid or expired. "
+ "Copy fresh cookies (p-b and p-lat) from an active poe.com session.",
+ error_type="auth_error"
+ )
+ if "rate limit" in error_str or "429" in error_str:
+ raise UnifiedClientError(
+ "POE rate limit exceeded. Please wait before trying again.",
+ error_type="rate_limit"
+ )
+ raise UnifiedClientError(f"Poe API error: {e}")
+
+ def _save_openrouter_config(self, config_data: dict, response_name: str = None):
+ """Save OpenRouter configuration next to the current request payloads (thread-specific directory)"""
+ if not os.getenv("SAVE_PAYLOAD", "1") == "1":
+ return
+
+ # Handle None or empty response_name
+ if not response_name:
+ response_name = f"config_{datetime.now().strftime('%H%M%S')}"
+
+ # Sanitize response_name
+ import re
+ response_name = re.sub(r'[<>:"/\\|?*]', '_', str(response_name))
+
+ # Reuse the same payload directory as other saves
+ thread_dir = self._get_thread_directory()
+ os.makedirs(thread_dir, exist_ok=True)
+
+ # Create filename
+ timestamp = datetime.now().strftime("%H%M%S")
+ config_filename = f"openrouter_config_{timestamp}_{response_name}.json"
+ config_path = os.path.join(thread_dir, config_filename)
+
+ try:
+ with self._file_write_lock:
+ with open(config_path, 'w', encoding='utf-8') as f:
+ json.dump(config_data, f, indent=2, ensure_ascii=False)
+ #print(f"Saved OpenRouter config to: {config_path}")
+ except Exception as e:
+ print(f"Failed to save OpenRouter config: {e}")
+
+
+ def _send_fireworks(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to OpenAI API with o-series model support"""
+ # Check if this is actually Azure
+ if os.getenv('IS_AZURE_ENDPOINT') == '1':
+ # Route to Azure-compatible handler
+ base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+ return self._send_openai_compatible(
+ messages, temperature, max_tokens,
+ base_url=base_url,
+ response_name=response_name,
+ provider="azure"
+ )
+
+ max_retries = self._get_max_retries()
+ api_delay = float(os.getenv("SEND_INTERVAL_SECONDS", "2"))
+
+ # Track what fixes we've already tried
+ fixes_attempted = {
+ 'temperature': False,
+ 'system_message': False,
+ 'max_tokens_param': False
+ }
+
+ for attempt in range(max_retries):
+ try:
+ params = self._build_openai_params(messages, temperature, max_tokens, max_completion_tokens)
+
+ # Get user-configured anti-duplicate parameters
+ anti_dupe_params = self._get_anti_duplicate_params(temperature)
+ params.update(anti_dupe_params)
+
+ # Apply any fixes from previous attempts
+ if fixes_attempted['temperature'] and 'temperature_override' in fixes_attempted:
+ params['temperature'] = fixes_attempted['temperature_override']
+
+ if fixes_attempted['system_message']:
+ # Convert system messages to user messages
+ new_messages = []
+ for msg in params.get('messages', []):
+ if msg['role'] == 'system':
+ new_messages.append({
+ 'role': 'user',
+ 'content': f"Instructions: {msg['content']}"
+ })
+ else:
+ new_messages.append(msg)
+ params['messages'] = new_messages
+
+ if fixes_attempted['max_tokens_param']:
+ if 'max_tokens' in params:
+ params['max_completion_tokens'] = params.pop('max_tokens')
+
+ # Check for cancellation
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled")
+
+ # Log the request for debugging
+ logger.debug(f"OpenAI request - Model: {self.model}, Params: {list(params.keys())}")
+
+
+ # Make the API call
+ resp = self.openai_client.chat.completions.create(
+ **params,
+ timeout=self.request_timeout,
+ idempotency_key=self._get_idempotency_key()
+ )
+
+ # Enhanced response validation with detailed logging
+ if not resp:
+ print("OpenAI returned None response")
+ raise UnifiedClientError("OpenAI returned empty response object")
+
+ if not hasattr(resp, 'choices'):
+ print(f"OpenAI response missing 'choices'. Response type: {type(resp)}")
+ print(f"Response attributes: {dir(resp)[:10]}") # Log first 10 attributes
+ raise UnifiedClientError("Invalid OpenAI response structure - missing choices")
+
+ if not resp.choices:
+ print("OpenAI response has empty choices array")
+ # Check if this is a content filter issue
+ if hasattr(resp, 'model') and hasattr(resp, 'id'):
+ print(f"Response ID: {resp.id}, Model: {resp.model}")
+ raise UnifiedClientError("OpenAI returned empty choices array")
+
+ choice = resp.choices[0]
+
+ # Enhanced choice validation
+ if not hasattr(choice, 'message'):
+ print(f"OpenAI choice missing 'message'. Choice type: {type(choice)}")
+ print(f"Choice attributes: {dir(choice)[:10]}")
+ raise UnifiedClientError("OpenAI choice missing message")
+
+ # Check if this is actually Gemini using OpenAI endpoint
+ is_gemini_via_openai = False
+ if hasattr(self, '_original_client_type') and self._original_client_type == 'gemini':
+ is_gemini_via_openai = True
+ logger.info("This is Gemini using OpenAI-compatible endpoint")
+ elif self.model.lower().startswith('gemini'):
+ is_gemini_via_openai = True
+ logger.info("Detected Gemini model via OpenAI endpoint")
+
+ if not choice.message:
+ # Gemini via OpenAI sometimes returns None message
+ if is_gemini_via_openai:
+ print("Gemini via OpenAI returned None message - creating empty message")
+ # Create a mock message object
+ class MockMessage:
+ content = ""
+ refusal = None
+ choice.message = MockMessage()
+ else:
+ print("OpenAI choice.message is None")
+ raise UnifiedClientError("OpenAI message is empty")
+
+ # Check for content with detailed debugging
+ content = None
+
+ # Try different ways to get content
+ if hasattr(choice.message, 'content'):
+ content = choice.message.content
+ elif hasattr(choice.message, 'text'):
+ content = choice.message.text
+ elif isinstance(choice.message, dict):
+ content = choice.message.get('content') or choice.message.get('text')
+
+ # Log what we found
+ if content is None:
+ # For Gemini via OpenAI, None content is common and not an error
+ if is_gemini_via_openai:
+ print("Gemini via OpenAI returned None content - likely a safety filter")
+ content = "" # Set to empty string instead of raising error
+ finish_reason = 'content_filter'
+ else:
+ print(f"OpenAI message has no content. Message type: {type(choice.message)}")
+ print(f"Message attributes: {dir(choice.message)[:20]}")
+ print(f"Message representation: {str(choice.message)[:200]}")
+
+ # Check if this is a refusal (only if not already handled by Gemini)
+ if content is None and hasattr(choice.message, 'refusal') and choice.message.refusal:
+ print(f"OpenAI refused: {choice.message.refusal}")
+ # Return the refusal as content
+ content = f"[REFUSED BY OPENAI]: {choice.message.refusal}"
+ finish_reason = 'content_filter'
+ elif hasattr(choice, 'finish_reason'):
+ finish_reason = choice.finish_reason
+ print(f"Finish reason: {finish_reason}")
+
+ # Check for specific finish reasons
+ if finish_reason == 'content_filter':
+ content = "[CONTENT BLOCKED BY OPENAI SAFETY FILTER]"
+ elif finish_reason == 'length':
+ content = "" # Empty but will be marked as truncated
+ else:
+ # Try to extract any available info
+ content = f"[EMPTY RESPONSE - Finish reason: {finish_reason}]"
+ else:
+ content = "[EMPTY RESPONSE FROM OPENAI]"
+
+ # Handle empty string content
+ elif content == "":
+ print("OpenAI returned empty string content")
+ finish_reason = getattr(choice, 'finish_reason', 'unknown')
+
+ if finish_reason == 'length':
+ logger.info("Empty content due to length limit")
+ # This is a truncation at the start - token limit too low
+ return UnifiedResponse(
+ content="",
+ finish_reason='length',
+ error_details={
+ 'error': 'Response truncated - increase max_completion_tokens',
+ 'finish_reason': 'length',
+ 'token_limit': params.get('max_completion_tokens') or params.get('max_tokens')
+ }
+ )
+ elif finish_reason == 'content_filter':
+ content = "[CONTENT BLOCKED BY OPENAI]"
+ else:
+ print(f"Empty content with finish_reason: {finish_reason}")
+ content = f"[EMPTY - Reason: {finish_reason}]"
+
+ # Get finish reason (with fallback)
+ finish_reason = getattr(choice, 'finish_reason', 'stop')
+
+ # Normalize OpenAI finish reasons
+ if finish_reason == "max_tokens":
+ finish_reason = "length"
+
+ # Special handling for Gemini empty responses
+ if is_gemini_via_openai and content == "" and finish_reason == 'stop':
+ # Empty content with 'stop' from Gemini usually means safety filter
+ print("Empty Gemini response with finish_reason='stop' - likely safety filter")
+ content = "[BLOCKED BY GEMINI SAFETY FILTER]"
+ finish_reason = 'content_filter'
+
+ # Extract usage
+ usage = None
+ if hasattr(resp, 'usage') and resp.usage:
+ usage = {
+ 'prompt_tokens': resp.usage.prompt_tokens,
+ 'completion_tokens': resp.usage.completion_tokens,
+ 'total_tokens': resp.usage.total_tokens
+ }
+ logger.debug(f"Token usage: {usage}")
+
+ # Log successful response
+ logger.info(f"OpenAI response - Content length: {len(content) if content else 0}, Finish reason: {finish_reason}")
+
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=resp
+ )
+
+ except OpenAIError as e:
+ error_str = str(e)
+ error_dict = None
+
+ # Try to extract error details
+ try:
+ if hasattr(e, 'response') and hasattr(e.response, 'json'):
+ error_dict = e.response.json()
+ print(f"OpenAI error details: {error_dict}")
+ except:
+ pass
+
+ # Check if we can fix the error and retry
+ should_retry = False
+
+ # Handle temperature constraints reactively
+ if not fixes_attempted['temperature'] and "temperature" in error_str and ("does not support" in error_str or "unsupported_value" in error_str):
+ # Extract what temperature the model wants
+ default_temp = 1 # Default fallback
+ if "Only the default (1)" in error_str:
+ default_temp = 1
+ elif error_dict and 'error' in error_dict:
+ # Try to parse the required temperature from error message
+ import re
+ temp_match = re.search(r'default \((\d+(?:\.\d+)?)\)', error_dict['error'].get('message', ''))
+ if temp_match:
+ default_temp = float(temp_match.group(1))
+
+ # Send message to GUI
+ print(f"🔄 Model {self.model} requires temperature={default_temp}, retrying...")
+
+ print(f"Model {self.model} requires temperature={default_temp}, will retry...")
+ fixes_attempted['temperature'] = True
+ fixes_attempted['temperature_override'] = default_temp
+ should_retry = True
+
+ # Handle system message constraints reactively
+ elif not fixes_attempted['system_message'] and "system" in error_str.lower() and ("not supported" in error_str or "unsupported" in error_str):
+ print(f"Model {self.model} doesn't support system messages, will convert and retry...")
+ fixes_attempted['system_message'] = True
+ should_retry = True
+
+ # Handle max_tokens vs max_completion_tokens reactively
+ elif not fixes_attempted['max_tokens_param'] and "max_tokens" in error_str and ("not supported" in error_str or "max_completion_tokens" in error_str):
+ print(f"Switching from max_tokens to max_completion_tokens for model {self.model}")
+ fixes_attempted['max_tokens_param'] = True
+ should_retry = True
+ time.sleep(api_delay)
+ continue
+
+ # Handle rate limits
+ elif "rate limit" in error_str.lower() or "429" in error_str:
+ # In multi-key mode, don't retry here - let outer handler rotate keys
+ if self._multi_key_mode:
+ print(f"OpenAI rate limit hit in multi-key mode - passing to key rotation")
+ raise UnifiedClientError(f"OpenAI rate limit: {e}", error_type="rate_limit")
+ elif attempt < max_retries - 1:
+ # Single key mode - wait and retry
+ wait_time = api_delay * 10
+ print(f"Rate limit hit, waiting {wait_time}s before retry")
+ time.sleep(wait_time)
+ continue
+
+ # If we identified a fix, retry immediately
+ if should_retry and attempt < max_retries - 1:
+ time.sleep(api_delay)
+ continue
+
+ # Other errors or no retries left
+ if attempt < max_retries - 1:
+ print(f"OpenAI error (attempt {attempt + 1}/{max_retries}): {e}")
+ time.sleep(api_delay)
+ continue
+
+ print(f"OpenAI error after all retries: {e}")
+ raise UnifiedClientError(f"OpenAI error: {e}", error_type="api_error")
+
+ except Exception as e:
+ if attempt < max_retries - 1:
+ print(f"Unexpected error (attempt {attempt + 1}/{max_retries}): {e}")
+ time.sleep(api_delay)
+ continue
+ raise UnifiedClientError(f"OpenAI error: {e}", error_type="unknown")
+
+ raise UnifiedClientError("OpenAI API failed after all retries")
+
+ def _build_openai_params(self, messages, temperature, max_tokens, max_completion_tokens=None):
+ """Build parameters for OpenAI API call"""
+ params = {
+ "model": self.model,
+ "messages": messages,
+ "temperature": temperature
+ }
+
+ # Determine which token parameter to use based on model
+ if self._is_o_series_model():
+ # o-series models use max_completion_tokens
+ # The manga translator passes the actual value as max_tokens for now
+ if max_completion_tokens is not None:
+ params["max_completion_tokens"] = max_completion_tokens
+ elif max_tokens is not None:
+ params["max_completion_tokens"] = max_tokens
+ logger.debug(f"Using max_completion_tokens={max_tokens} for o-series model {self.model}")
+ else:
+ # Regular models use max_tokens
+ if max_tokens is not None:
+ params["max_tokens"] = max_tokens
+
+ return params
+
+ def _supports_thinking(self) -> bool:
+ """Check if the current Gemini model supports thinking parameter"""
+ if not self.model:
+ return False
+
+ model_lower = self.model.lower()
+
+ # According to Google documentation, thinking is supported on:
+ # 1. All Gemini 2.5 series models (Pro, Flash, Flash-Lite)
+ # 2. Gemini 2.0 Flash Thinking Experimental model
+
+ # Check for Gemini 2.5 series
+ if 'gemini-2.5' in model_lower:
+ return True
+
+ # Check for Gemini 2.0 Flash Thinking model variants
+ thinking_models = [
+ 'gemini-2.0-flash-thinking-exp',
+ 'gemini-2.0-flash-thinking-experimental',
+ 'gemini-2.0-flash-thinking-exp-1219',
+ 'gemini-2.0-flash-thinking-exp-01-21',
+ ]
+
+ for thinking_model in thinking_models:
+ if thinking_model in model_lower:
+ return True
+
+ return False
+
+ def _get_thread_directory(self):
+ """Get thread-specific directory for payload storage"""
+ thread_name = threading.current_thread().name
+ # Prefer the client's explicit context if available
+ explicit = getattr(self, 'context', None)
+ if explicit in ('translation', 'glossary', 'summary'):
+ context = explicit
+ else:
+ if 'Translation' in thread_name:
+ context = 'translation'
+ elif 'Glossary' in thread_name:
+ context = 'glossary'
+ elif 'Summary' in thread_name:
+ context = 'summary'
+ else:
+ context = 'general'
+
+ thread_dir = os.path.join("Payloads", context, f"{thread_name}_{threading.current_thread().ident}")
+ os.makedirs(thread_dir, exist_ok=True)
+ return thread_dir
+
+ def _save_gemini_safety_config(self, config_data: dict, response_name: str = None):
+ """Save Gemini safety configuration next to the current request payloads"""
+ if not os.getenv("SAVE_PAYLOAD", "1") == "1":
+ return
+
+ # Handle None or empty response_name
+ if not response_name:
+ response_name = f"safety_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+
+ # Sanitize response_name to ensure it's filesystem-safe
+ # Remove or replace invalid characters
+ import re
+ response_name = re.sub(r'[<>:\"/\\|?*]', '_', str(response_name))
+
+ # Reuse the same payload directory as other saves
+ thread_dir = self._get_thread_directory()
+ os.makedirs(thread_dir, exist_ok=True)
+
+ # Create unique filename with timestamp
+ timestamp = datetime.now().strftime("%H%M%S")
+
+ # Ensure response_name doesn't already contain timestamp to avoid duplication
+ if timestamp not in response_name:
+ config_filename = f"gemini_safety_{timestamp}_{response_name}.json"
+ else:
+ config_filename = f"gemini_safety_{response_name}.json"
+
+ config_path = os.path.join(thread_dir, config_filename)
+
+ try:
+ with self._file_write_lock:
+ with open(config_path, 'w', encoding='utf-8') as f:
+ json.dump(config_data, f, indent=2, ensure_ascii=False)
+ # Only log if not stopping
+ if not self._is_stop_requested():
+ print(f"Saved Gemini safety status to: {config_path}")
+ except Exception as e:
+ # Only log errors if not stopping
+ if not self._is_stop_requested():
+ print(f"Failed to save Gemini safety config: {e}")
+
+ def _send_gemini(self, messages, temperature, max_tokens, response_name, image_base64=None) -> UnifiedResponse:
+ """Send request to Gemini API with support for both text and multi-image messages"""
+ response = None
+
+ # Check if we should use OpenAI-compatible endpoint
+ use_openai_endpoint = os.getenv("USE_GEMINI_OPENAI_ENDPOINT", "0") == "1"
+ gemini_endpoint = os.getenv("GEMINI_OPENAI_ENDPOINT", "")
+
+ # Import types at the top
+ from google.genai import types
+
+ # Check if this contains images
+ has_images = image_base64 is not None # Direct image parameter
+ if not has_images:
+ for msg in messages:
+ if isinstance(msg.get('content'), list):
+ for part in msg['content']:
+ if part.get('type') == 'image_url':
+ has_images = True
+ break
+ if has_images:
+ break
+
+ # Check if safety settings are disabled
+ disable_safety = os.getenv("DISABLE_GEMINI_SAFETY", "false").lower() == "true"
+
+ # Get thinking budget from environment
+ thinking_budget = int(os.getenv("THINKING_BUDGET", "-1"))
+
+ # Check if this model supports thinking
+ supports_thinking = self._supports_thinking()
+
+ # Configure safety settings
+ safety_settings = None
+ if disable_safety:
+ # Set all safety categories to BLOCK_NONE (most permissive)
+ safety_settings = [
+ types.SafetySetting(
+ category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
+ threshold=types.HarmBlockThreshold.BLOCK_NONE
+ ),
+ types.SafetySetting(
+ category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
+ threshold=types.HarmBlockThreshold.BLOCK_NONE
+ ),
+ types.SafetySetting(
+ category=types.HarmCategory.HARM_CATEGORY_HARASSMENT,
+ threshold=types.HarmBlockThreshold.BLOCK_NONE
+ ),
+ types.SafetySetting(
+ category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
+ threshold=types.HarmBlockThreshold.BLOCK_NONE
+ ),
+ types.SafetySetting(
+ category=types.HarmCategory.HARM_CATEGORY_CIVIC_INTEGRITY,
+ threshold=types.HarmBlockThreshold.BLOCK_NONE
+ ),
+ ]
+ if not self._is_stop_requested():
+ logger.info("Gemini safety settings disabled - using BLOCK_NONE for all categories")
+ else:
+ if not self._is_stop_requested():
+ logger.info("Using default Gemini safety settings")
+
+ # Define retry attempts
+ attempts = self._get_max_retries()
+ attempt = 0
+ error_details = {}
+
+ # Prepare configuration data
+ readable_safety = "DEFAULT"
+ safety_status = "ENABLED - Using default Gemini safety settings"
+ if disable_safety:
+ safety_status = "DISABLED - All categories set to BLOCK_NONE"
+ readable_safety = {
+ "HATE_SPEECH": "BLOCK_NONE",
+ "SEXUALLY_EXPLICIT": "BLOCK_NONE",
+ "HARASSMENT": "BLOCK_NONE",
+ "DANGEROUS_CONTENT": "BLOCK_NONE",
+ "CIVIC_INTEGRITY": "BLOCK_NONE"
+ }
+
+ # Log to console with thinking status - only if not stopping
+ if not self._is_stop_requested():
+ endpoint_info = f" (via OpenAI endpoint: {gemini_endpoint})" if use_openai_endpoint else " (native API)"
+ print(f"🔒 Gemini Safety Status: {safety_status}{endpoint_info}")
+
+ thinking_status = ""
+ if supports_thinking:
+ if thinking_budget == 0:
+ thinking_status = " (thinking disabled)"
+ elif thinking_budget == -1:
+ thinking_status = " (dynamic thinking)"
+ elif thinking_budget > 0:
+ thinking_status = f" (thinking budget: {thinking_budget})"
+ else:
+ thinking_status = " (thinking not supported)"
+
+ print(f"🧠 Thinking Status: {thinking_status}")
+
+ # Save configuration to file
+ request_type = "IMAGE_REQUEST" if has_images else "TEXT_REQUEST"
+ if use_openai_endpoint:
+ request_type = "GEMINI_OPENAI_ENDPOINT_" + request_type
+ config_data = {
+ "type": request_type,
+ "model": self.model,
+ "endpoint": gemini_endpoint if use_openai_endpoint else "native",
+ "safety_enabled": not disable_safety,
+ "safety_settings": readable_safety,
+ "temperature": temperature,
+ "max_output_tokens": max_tokens,
+ "thinking_supported": supports_thinking,
+ "thinking_budget": thinking_budget if supports_thinking else None,
+ "timestamp": datetime.now().isoformat(),
+ }
+
+ # Save configuration to file with thread isolation
+ self._save_gemini_safety_config(config_data, response_name)
+
+ # Main attempt loop - SAME FOR BOTH ENDPOINTS
+ while attempt < attempts:
+ try:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled")
+
+ # Get user-configured anti-duplicate parameters
+ anti_dupe_params = self._get_anti_duplicate_params(temperature)
+
+ # Build generation config with anti-duplicate parameters
+ generation_config_params = {
+ "temperature": temperature,
+ "max_output_tokens": max_tokens,
+ **anti_dupe_params # Add user's custom parameters
+ }
+
+ # Log the request - only if not stopping
+ if not self._is_stop_requested():
+ print(f" 📊 Temperature: {temperature}, Max tokens: {max_tokens}")
+
+ # ========== MAKE THE API CALL - DIFFERENT FOR EACH ENDPOINT ==========
+ if use_openai_endpoint and gemini_endpoint:
+ # Ensure the endpoint ends with /openai/ for compatibility
+ if not gemini_endpoint.endswith('/openai/'):
+ if gemini_endpoint.endswith('/'):
+ gemini_endpoint = gemini_endpoint + 'openai/'
+ else:
+ gemini_endpoint = gemini_endpoint + '/openai/'
+
+ # Call OpenAI-compatible endpoint
+ response = self._send_openai_compatible(
+ messages=messages,
+ temperature=temperature,
+ max_tokens=max_tokens,
+ base_url=gemini_endpoint,
+ response_name=response_name,
+ provider="gemini-openai"
+ )
+
+ # For OpenAI endpoint, we already have a UnifiedResponse
+ # Extract any thinking tokens if available
+ thinking_tokens_displayed = False
+
+ if hasattr(response, 'raw_response'):
+ raw_resp = response.raw_response
+
+ # Check multiple possible locations for thinking tokens
+ thinking_tokens = 0
+
+ # Check in usage object
+ if hasattr(raw_resp, 'usage'):
+ usage = raw_resp.usage
+
+ # Try various field names that might contain thinking tokens
+ if hasattr(usage, 'thoughts_token_count'):
+ thinking_tokens = usage.thoughts_token_count or 0
+ elif hasattr(usage, 'thinking_tokens'):
+ thinking_tokens = usage.thinking_tokens or 0
+ elif hasattr(usage, 'reasoning_tokens'):
+ thinking_tokens = usage.reasoning_tokens or 0
+
+ # Also check if there's a breakdown in the usage
+ if hasattr(usage, 'completion_tokens_details'):
+ details = usage.completion_tokens_details
+ if hasattr(details, 'reasoning_tokens'):
+ thinking_tokens = details.reasoning_tokens or 0
+
+ # Check in the raw response itself
+ if thinking_tokens == 0 and hasattr(raw_resp, '__dict__'):
+ # Look for thinking-related fields in the response
+ for field_name in ['thoughts_token_count', 'thinking_tokens', 'reasoning_tokens']:
+ if field_name in raw_resp.__dict__:
+ thinking_tokens = raw_resp.__dict__[field_name] or 0
+ if thinking_tokens > 0:
+ break
+
+ # Display thinking tokens if found or if thinking was requested - only if not stopping
+ if supports_thinking and not self._is_stop_requested():
+ if thinking_tokens > 0:
+ print(f" 💭 Thinking tokens used: {thinking_tokens}")
+ thinking_tokens_displayed = True
+ elif thinking_budget == 0:
+ print(f" ✅ Thinking successfully disabled (0 thinking tokens)")
+ thinking_tokens_displayed = True
+ elif thinking_budget == -1:
+ # Dynamic thinking - might not be reported
+ print(f" 💭 Thinking: Dynamic mode (tokens may not be reported via OpenAI endpoint)")
+ thinking_tokens_displayed = True
+ elif thinking_budget > 0:
+ # Specific budget requested but not reported
+ print(f" ⚠️ Thinking budget set to {thinking_budget} but tokens not reported via OpenAI endpoint")
+ thinking_tokens_displayed = True
+
+ # If we haven't displayed thinking status yet and it's supported, show a message
+ if not thinking_tokens_displayed and supports_thinking:
+ logger.debug("Thinking tokens may have been used but are not reported via OpenAI endpoint")
+
+ # Check finish reason for prohibited content
+ if response.finish_reason == 'content_filter' or response.finish_reason == 'prohibited_content':
+ raise UnifiedClientError(
+ "Content blocked by Gemini OpenAI endpoint",
+ error_type="prohibited_content",
+ details={"endpoint": "openai", "finish_reason": response.finish_reason}
+ )
+
+ return response
+
+ else:
+ # Native Gemini API call
+ # Prepare content based on whether we have images
+ if has_images:
+ # Handle image content
+ contents = self._prepare_gemini_image_content(messages, image_base64)
+ else:
+ # text-only: use formatted prompt
+ formatted_prompt = self._format_prompt(messages, style='gemini')
+ contents = formatted_prompt
+ # Only add thinking_config if the model supports it
+ if supports_thinking:
+ # Create thinking config separately
+ thinking_config = types.ThinkingConfig(
+ thinking_budget=thinking_budget
+ )
+
+ # Create generation config with thinking_config as a parameter
+ generation_config = types.GenerateContentConfig(
+ thinking_config=thinking_config,
+ **generation_config_params
+ )
+ else:
+ # Create generation config without thinking_config
+ generation_config = types.GenerateContentConfig(
+ **generation_config_params
+ )
+
+ # Add safety settings to config if they exist
+ if safety_settings:
+ generation_config.safety_settings = safety_settings
+
+ # Make the native API call
+ # Make the native API call with proper error handling
+ try:
+ # Check if gemini_client exists and is not None
+ if not hasattr(self, 'gemini_client') or self.gemini_client is None:
+ print("⚠️ Gemini client is None. This typically happens when stop was requested.")
+ raise UnifiedClientError("Gemini client not initialized - operation may have been cancelled", error_type="cancelled")
+
+ response = self.gemini_client.models.generate_content(
+ model=self.model,
+ contents=contents,
+ config=generation_config
+ )
+ except AttributeError as e:
+ if "'NoneType' object has no attribute 'models'" in str(e):
+ print("⚠️ Gemini client is None or invalid. This typically happens when stop was requested.")
+ raise UnifiedClientError("Gemini client not initialized - operation may have been cancelled", error_type="cancelled")
+ else:
+ raise
+
+ # Check for blocked content in prompt_feedback
+ if hasattr(response, 'prompt_feedback'):
+ feedback = response.prompt_feedback
+ if hasattr(feedback, 'block_reason') and feedback.block_reason:
+ error_details['block_reason'] = str(feedback.block_reason)
+ if disable_safety:
+ print(f"Content blocked despite safety disabled: {feedback.block_reason}")
+ else:
+ print(f"Content blocked: {feedback.block_reason}")
+
+ # Raise as UnifiedClientError with prohibited_content type
+ raise UnifiedClientError(
+ f"Content blocked: {feedback.block_reason}",
+ error_type="prohibited_content",
+ details={"block_reason": str(feedback.block_reason)}
+ )
+
+ # Check if response has candidates with prohibited content finish reason
+ prohibited_detected = False
+ finish_reason = 'stop' # Default
+
+ if hasattr(response, 'candidates') and response.candidates:
+ for candidate in response.candidates:
+ if hasattr(candidate, 'finish_reason'):
+ finish_reason_str = str(candidate.finish_reason)
+ if 'PROHIBITED_CONTENT' in finish_reason_str:
+ prohibited_detected = True
+ finish_reason = 'prohibited_content'
+ print(f" 🚫 Candidate has prohibited content finish reason: {finish_reason_str}")
+ break
+ elif 'MAX_TOKENS' in finish_reason_str:
+ finish_reason = 'length'
+ elif 'SAFETY' in finish_reason_str:
+ finish_reason = 'safety'
+
+ # If prohibited content detected, raise error for retry logic
+ if prohibited_detected:
+ # Get thinking tokens if available for debugging
+ thinking_tokens_wasted = 0
+ if hasattr(response, 'usage_metadata') and hasattr(response.usage_metadata, 'thoughts_token_count'):
+ thinking_tokens_wasted = response.usage_metadata.thoughts_token_count or 0
+ if thinking_tokens_wasted > 0:
+ print(f" ⚠️ Wasted {thinking_tokens_wasted} thinking tokens on prohibited content")
+
+ raise UnifiedClientError(
+ "Content blocked: FinishReason.PROHIBITED_CONTENT",
+ error_type="prohibited_content",
+ details={
+ "finish_reason": "PROHIBITED_CONTENT",
+ "thinking_tokens_wasted": thinking_tokens_wasted
+ }
+ )
+
+ # Log thinking token usage if available - only if not stopping
+ if hasattr(response, 'usage_metadata') and not self._is_stop_requested():
+ usage = response.usage_metadata
+ if supports_thinking and hasattr(usage, 'thoughts_token_count'):
+ if usage.thoughts_token_count and usage.thoughts_token_count > 0:
+ print(f" 💭 Thinking tokens used: {usage.thoughts_token_count}")
+ else:
+ print(f" ✅ Thinking successfully disabled (0 thinking tokens)")
+
+ # Extract text from the Gemini response - FIXED LOGIC HERE
+ text_content = ""
+
+ # Try the simple .text property first (most common)
+ if hasattr(response, 'text'):
+ try:
+ text_content = response.text
+ if text_content:
+ print(f" ✅ Extracted {len(text_content)} chars from response.text")
+ except Exception as e:
+ print(f" ⚠️ Could not access response.text: {e}")
+
+ # If that didn't work or returned empty, try extracting from candidates
+ if not text_content:
+ # CRITICAL FIX: Check if candidates exists AND is not None before iterating
+ if hasattr(response, 'candidates') and response.candidates is not None:
+ print(f" 🔍 Extracting from candidates...")
+ try:
+ for candidate in response.candidates:
+ if hasattr(candidate, 'content') and candidate.content:
+ if hasattr(candidate.content, 'parts') and candidate.content.parts:
+ for part in candidate.content.parts:
+ if hasattr(part, 'text') and part.text:
+ text_content += part.text
+ elif hasattr(candidate.content, 'text') and candidate.content.text:
+ text_content += candidate.content.text
+
+ if text_content:
+ print(f" ✅ Extracted {len(text_content)} chars from candidates")
+ except TypeError as e:
+ print(f" ⚠️ Error iterating candidates: {e}")
+ print(f" 🔍 Candidates type: {type(response.candidates)}")
+ else:
+ print(f" ⚠️ No candidates found in response or candidates is None")
+
+ # Log if we still have no content
+ if not text_content:
+ print(f" ⚠️ Warning: No text content extracted from Gemini response")
+ print(f" 🔍 Response attributes: {list(response.__dict__.keys()) if hasattr(response, '__dict__') else 'No __dict__'}")
+
+ # Return with the actual content populated
+ return UnifiedResponse(
+ content=text_content, # Properly populated with the actual response text
+ finish_reason=finish_reason,
+ raw_response=response,
+ error_details=error_details if error_details else None
+ )
+ # ========== END OF API CALL SECTION ==========
+
+ except UnifiedClientError as e:
+ # Re-raise UnifiedClientErrors (including prohibited content)
+ # This will trigger main key retry in the outer send() method
+ raise
+
+ except Exception as e:
+ print(f"Gemini attempt {attempt+1} failed: {e}")
+ error_details[f'attempt_{attempt+1}'] = str(e)
+
+ # Check if this is a prohibited content error
+ error_str = str(e).lower()
+ if any(indicator in error_str for indicator in [
+ "content blocked", "prohibited_content", "blockedreason",
+ "content_filter", "safety filter", "harmful content"
+ ]):
+ # Re-raise as UnifiedClientError with proper type
+ raise UnifiedClientError(
+ str(e),
+ error_type="prohibited_content",
+ details=error_details
+ )
+
+ # Check if this is a rate limit error
+ if "429" in error_str or "rate limit" in error_str.lower():
+ # Re-raise for multi-key handling
+ raise UnifiedClientError(
+ f"Rate limit exceeded: {e}",
+ error_type="rate_limit",
+ http_status=429
+ )
+
+ # For other errors, we might want to retry
+ if attempt < attempts - 1:
+ attempt += 1
+ wait_time = min(2 ** attempt, 10) # Exponential backoff with max 10s
+ print(f"⏳ Retrying Gemini in {wait_time}s...")
+ time.sleep(wait_time)
+ continue
+ else:
+ # Final attempt failed, re-raise
+ raise
+
+ # If we exhausted all attempts without success
+ print(f"❌ All {attempts} Gemini attempts failed")
+
+ # Log the failure
+ self._log_truncation_failure(
+ messages=messages,
+ response_content="",
+ finish_reason='error',
+ context=self.context,
+ error_details={'error': 'all_retries_failed', 'provider': 'gemini', 'attempts': attempts, 'details': error_details}
+ )
+
+ # Return error response
+ return UnifiedResponse(
+ content="",
+ finish_reason='error',
+ raw_response=response,
+ error_details={'error': 'all_retries_failed', 'attempts': attempts, 'details': error_details}
+ )
+
+ def _format_prompt(self, messages, *, style: str) -> str:
+ """
+ Format messages into a single prompt string.
+ style:
+ - 'gemini': SYSTEM lines as 'INSTRUCTIONS: ...', others 'ROLE: ...'
+ - 'ai21': SYSTEM as 'Instructions: ...', USER as 'User: ...', ASSISTANT as 'Assistant: ...', ends with 'Assistant: '
+ - 'replicate': simple concatenation of SYSTEM, USER, ASSISTANT with labels, no trailing assistant line
+ """
+ formatted_parts = []
+ for msg in messages:
+ role = (msg.get('role') or 'user').upper()
+ content = msg.get('content', '')
+ if style == 'gemini':
+ if role == 'SYSTEM':
+ formatted_parts.append(f"INSTRUCTIONS: {content}")
+ else:
+ formatted_parts.append(f"{role}: {content}")
+ elif style in ('ai21', 'replicate'):
+ if role == 'SYSTEM':
+ label = 'Instructions' if style == 'ai21' else 'System'
+ formatted_parts.append(f"{label}: {content}")
+ elif role == 'USER':
+ formatted_parts.append(f"User: {content}")
+ elif role == 'ASSISTANT':
+ formatted_parts.append(f"Assistant: {content}")
+ else:
+ formatted_parts.append(f"{role.title()}: {content}")
+ else:
+ formatted_parts.append(str(content))
+ prompt = "\n\n".join(formatted_parts)
+ if style == 'ai21':
+ prompt += "\nAssistant: "
+ return prompt
+
+ def _send_anthropic(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Anthropic API"""
+ max_retries = self._get_max_retries()
+
+ headers = {
+ "X-API-Key": self.api_key,
+ "Content-Type": "application/json",
+ "anthropic-version": "2023-06-01"
+ }
+
+ # Format messages for Anthropic
+ system_message = None
+ formatted_messages = []
+ for msg in messages:
+ if msg['role'] == 'system':
+ system_message = msg['content']
+ else:
+ formatted_messages.append({
+ "role": msg['role'],
+ "content": msg['content']
+ })
+
+ # Get user-configured anti-duplicate parameters
+ anti_dupe_params = self._get_anti_duplicate_params(temperature)
+ data = self._build_anthropic_payload(formatted_messages, temperature, max_tokens, anti_dupe_params, system_message)
+
+ resp = self._http_request_with_retries(
+ method="POST",
+ url="https://api.anthropic.com/v1/messages",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=max_retries,
+ provider_name="Anthropic"
+ )
+ json_resp = resp.json()
+ content, finish_reason, usage = self._parse_anthropic_json(json_resp)
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=json_resp
+ )
+
+ def _send_mistral(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Mistral API"""
+ max_retries = self._get_max_retries()
+ api_delay = self._get_send_interval()
+
+ if MistralClient and hasattr(self, 'mistral_client'):
+ # Use SDK if available
+ def _do():
+ chat_messages = []
+ for msg in messages:
+ chat_messages.append(ChatMessage(role=msg['role'], content=msg['content']))
+ response = self.mistral_client.chat(
+ model=self.model,
+ messages=chat_messages,
+ temperature=temperature,
+ max_tokens=max_tokens
+ )
+ content = response.choices[0].message.content if response.choices else ""
+ finish_reason = response.choices[0].finish_reason if response.choices else 'stop'
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ raw_response=response
+ )
+ return self._with_sdk_retries("Mistral", max_retries, _do)
+ else:
+ # Use HTTP API
+ return self._send_openai_compatible(
+ messages, temperature, max_tokens,
+ base_url="https://api.mistral.ai/v1",
+ response_name=response_name,
+ provider="mistral"
+ )
+
+ def _send_cohere(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Cohere API"""
+ max_retries = self._get_max_retries()
+ api_delay = self._get_send_interval()
+
+ if cohere and hasattr(self, 'cohere_client'):
+ # Use SDK with standardized retry wrapper
+ def _do():
+ # Format messages for Cohere
+ chat_history = []
+ message = ""
+ for msg in messages:
+ if msg['role'] == 'user':
+ message = msg['content']
+ elif msg['role'] == 'assistant':
+ chat_history.append({"role": "CHATBOT", "message": msg['content']})
+ elif msg['role'] == 'system':
+ message = msg['content'] + "\n\n" + message
+ response = self.cohere_client.chat(
+ model=self.model,
+ message=message,
+ chat_history=chat_history,
+ temperature=temperature,
+ max_tokens=max_tokens
+ )
+ content = response.text
+ finish_reason = 'stop'
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ raw_response=response
+ )
+ return self._with_sdk_retries("Cohere", max_retries, _do)
+ else:
+ # Use HTTP API with retry logic
+ headers = {
+ "Authorization": f"Bearer {self.api_key}",
+ "Content-Type": "application/json"
+ }
+
+ # Format for HTTP API
+ chat_history = []
+ message = ""
+
+ for msg in messages:
+ if msg['role'] == 'user':
+ message = msg['content']
+ elif msg['role'] == 'assistant':
+ chat_history.append({"role": "CHATBOT", "message": msg['content']})
+
+ data = {
+ "model": self.model,
+ "message": message,
+ "chat_history": chat_history,
+ "temperature": temperature,
+ "max_tokens": max_tokens
+ }
+
+ resp = self._http_request_with_retries(
+ method="POST",
+ url="https://api.cohere.ai/v1/chat",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=max_retries,
+ provider_name="Cohere"
+ )
+ json_resp = resp.json()
+ content = json_resp.get("text", "")
+ return UnifiedResponse(
+ content=content,
+ finish_reason='stop',
+ raw_response=json_resp
+ )
+
+ def _send_ai21(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to AI21 API"""
+ max_retries = self._get_max_retries()
+
+ headers = {
+ "Authorization": f"Bearer {self.api_key}",
+ "Content-Type": "application/json"
+ }
+
+ # Format messages for AI21
+ prompt = self._format_prompt(messages, style='ai21')
+
+ data = {
+ "prompt": prompt,
+ "temperature": temperature,
+ "maxTokens": max_tokens
+ }
+
+ resp = self._http_request_with_retries(
+ method="POST",
+ url=f"https://api.ai21.com/studio/v1/{self.model}/complete",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=max_retries,
+ provider_name="AI21"
+ )
+ json_resp = resp.json()
+ completions = json_resp.get("completions", [])
+ content = completions[0].get("data", {}).get("text", "") if completions else ""
+ return UnifiedResponse(
+ content=content,
+ finish_reason='stop',
+ raw_response=json_resp
+ )
+
+
+ def _send_replicate(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Replicate API"""
+ max_retries = self._get_max_retries()
+ api_delay = self._get_send_interval()
+
+ headers = {
+ "Authorization": f"Token {self.api_key}",
+ "Content-Type": "application/json"
+ }
+
+ # Format messages as single prompt
+ prompt = self._format_prompt(messages, style='replicate')
+
+ # Replicate uses versioned models
+ data = {
+ "version": self.model, # Model should be the version ID
+ "input": {
+ "prompt": prompt,
+ "temperature": temperature,
+ "max_tokens": max_tokens
+ }
+ }
+
+ # Create prediction
+ resp = self._http_request_with_retries(
+ method="POST",
+ url="https://api.replicate.com/v1/predictions",
+ headers=headers,
+ json=data,
+ expected_status=(201,),
+ max_retries=max_retries,
+ provider_name="Replicate"
+ )
+ prediction = resp.json()
+ prediction_id = prediction['id']
+
+ # Poll for result with GUI delay between polls
+ poll_count = 0
+ max_polls = 300 # Maximum 5 minutes at 1 second intervals
+
+ while poll_count < max_polls:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled")
+
+ resp = requests.get(
+ f"https://api.replicate.com/v1/predictions/{prediction_id}",
+ headers=headers,
+ timeout=self.request_timeout # Use configured timeout
+ )
+
+ if resp.status_code != 200:
+ raise UnifiedClientError(f"Replicate polling error: {resp.status_code}")
+
+ result = resp.json()
+
+ if result['status'] == 'succeeded':
+ content = result.get('output', '')
+ if isinstance(content, list):
+ content = ''.join(content)
+ break
+ elif result['status'] == 'failed':
+ raise UnifiedClientError(f"Replicate prediction failed: {result.get('error')}")
+
+ # Use GUI delay for polling interval
+ time.sleep(min(api_delay, 1)) # But at least 1 second
+ poll_count += 1
+
+ if poll_count >= max_polls:
+ raise UnifiedClientError("Replicate prediction timed out")
+
+ return UnifiedResponse(
+ content=content,
+ finish_reason='stop',
+ raw_response=result
+ )
+
+ def _send_openai_compatible(self, messages, temperature, max_tokens, base_url,
+ response_name, provider="generic", headers=None, model_override=None) -> UnifiedResponse:
+ """Send request to OpenAI-compatible APIs with safety settings"""
+ max_retries = self._get_max_retries()
+ api_delay = self._get_send_interval()
+
+ # Determine effective model for this call (do not rely on shared self.model)
+ if model_override is not None:
+ effective_model = model_override
+ else:
+ # Read instance model under microsecond lock to avoid cross-thread contamination
+ with self._model_lock:
+ effective_model = self.model
+ # Provider-specific model normalization (transport-only)
+ if provider == 'openrouter':
+ for prefix in ('or/', 'openrouter/'):
+ if effective_model.startswith(prefix):
+ effective_model = effective_model[len(prefix):]
+ break
+ effective_model = effective_model.strip()
+ elif provider == 'fireworks':
+ if effective_model.startswith('fireworks/'):
+ effective_model = effective_model[len('fireworks/') :]
+ if not effective_model.startswith('accounts/'):
+ effective_model = f"accounts/fireworks/models/{effective_model}"
+ elif provider == 'chutes':
+ # Strip the 'chutes/' prefix from the model name if present
+ if effective_model.startswith('chutes/'):
+ effective_model = effective_model[7:] # Remove 'chutes/' prefix
+
+ # CUSTOM ENDPOINT OVERRIDE - Check if enabled and override base_url
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+ actual_api_key = self.api_key
+
+ # Determine if this is a local endpoint that doesn't need a real API key
+ is_local_endpoint = False
+
+ # Never override OpenRouter base_url with custom endpoint
+ if use_custom_endpoint and provider not in ("gemini-openai", "openrouter"):
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+ if custom_base_url:
+ # Check if it's Azure
+ if '.azure.com' in custom_base_url or '.cognitiveservices' in custom_base_url:
+ # Azure needs special client
+ from openai import AzureOpenAI
+
+ deployment = effective_model # Use override or instance model as deployment name
+ api_version = os.getenv('AZURE_API_VERSION', '2024-12-01-preview')
+
+ # Azure endpoint should be just the base URL
+ azure_endpoint = custom_base_url.split('/openai')[0] if '/openai' in custom_base_url else custom_base_url
+
+ print(f"🔷 Azure endpoint detected")
+ print(f" Endpoint: {azure_endpoint}")
+ print(f" Deployment: {deployment}")
+ print(f" API Version: {api_version}")
+
+ # Create Azure client
+ for attempt in range(max_retries):
+ try:
+ client = AzureOpenAI(
+ api_key=actual_api_key,
+ api_version=api_version,
+ azure_endpoint=azure_endpoint
+ )
+
+ # Build params with correct token parameter based on model
+ params = {
+ "model": deployment,
+ "messages": messages,
+ "temperature": temperature
+ }
+
+ # Normalize token parameter for Azure endpoint
+ norm_max_tokens, norm_max_completion_tokens = self._normalize_token_params(max_tokens, None)
+ if norm_max_completion_tokens is not None:
+ params["max_completion_tokens"] = norm_max_completion_tokens
+ elif norm_max_tokens is not None:
+ params["max_tokens"] = norm_max_tokens
+
+ # Use Idempotency-Key via headers for compatibility
+ idem_key = self._get_idempotency_key()
+ response = client.chat.completions.create(
+ **params,
+ extra_headers={"Idempotency-Key": idem_key}
+ )
+
+ # Extract response
+ content = response.choices[0].message.content if response.choices else ""
+ finish_reason = response.choices[0].finish_reason if response.choices else "stop"
+
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ raw_response=response
+ )
+
+ except Exception as e:
+ error_str = str(e).lower()
+
+ # Check if this is a content filter error FIRST
+ if ("content_filter" in error_str or
+ "responsibleaipolicyviolation" in error_str or
+ "content management policy" in error_str or
+ "the response was filtered" in error_str):
+
+ # This is a content filter error - raise it immediately as prohibited_content
+ print(f"Azure content filter detected: {str(e)[:100]}")
+ raise UnifiedClientError(
+ f"Azure content blocked: {e}",
+ error_type="prohibited_content",
+ http_status=400,
+ details={"provider": "azure", "original_error": str(e)}
+ )
+
+ # Only retry for non-content-filter errors
+ if attempt < max_retries - 1:
+ print(f"Azure error (attempt {attempt + 1}): {e}")
+ time.sleep(api_delay)
+ continue
+
+ raise UnifiedClientError(f"Azure error: {e}")
+
+ # Not Azure, continue with regular custom endpoint
+ base_url = custom_base_url
+ print(f"🔄 Custom endpoint enabled: Overriding {provider} endpoint")
+ print(f" Original: {base_url}")
+ print(f" Override: {custom_base_url}")
+
+ # Check if it's Azure
+ if '.azure.com' in custom_base_url or '.cognitiveservices' in custom_base_url:
+ # Azure needs special handling
+ deployment = self.model # Use model as deployment name
+ api_version = os.getenv('AZURE_API_VERSION', '2024-08-01-preview')
+
+ # Fix Azure URL format
+ if '/openai/deployments/' not in custom_base_url:
+ custom_base_url = f"{custom_base_url.rstrip('/')}/openai/deployments/{deployment}/chat/completions?api-version={api_version}"
+
+ # Azure uses different auth header
+ if headers is None:
+ headers = {}
+ headers['api-key'] = actual_api_key
+ headers.pop('Authorization', None) # Remove OpenAI auth
+
+ print(f"🔷 Azure endpoint detected: {custom_base_url}")
+
+ base_url = custom_base_url
+
+ # Check if it's a local endpoint
+ local_indicators = [
+ 'localhost', '127.0.0.1', '0.0.0.0',
+ '192.168.', '10.', '172.16.', '172.17.', '172.18.', '172.19.',
+ '172.20.', '172.21.', '172.22.', '172.23.', '172.24.', '172.25.',
+ '172.26.', '172.27.', '172.28.', '172.29.', '172.30.', '172.31.',
+ ':11434', # Ollama default port
+ ':8080', # Common local API port
+ ':5000', # Common local API port
+ ':8000', # Common local API port
+ ':1234', # LM Studio default port
+ 'host.docker.internal', # Docker host
+ ]
+
+ # Also check if user explicitly marked it as local
+ is_local_llm_env = os.getenv('IS_LOCAL_LLM', '0') == '1'
+
+ is_local_endpoint = is_local_llm_env or any(indicator in custom_base_url.lower() for indicator in local_indicators)
+
+ if is_local_endpoint:
+ actual_api_key = "dummy-key-for-local-llm"
+ #print(f" 📍 Detected local endpoint, using dummy API key")
+ else:
+ #print(f" ☁️ Using actual API key for cloud endpoint")
+ pass
+
+ # For all other providers, use the actual API key
+ # Remove the special case for gemini-openai - it needs the real API key
+ if not is_local_endpoint:
+ #print(f" Using actual API key for {provider}")
+ pass
+
+ # Check if safety settings are disabled via GUI toggle
+ disable_safety = os.getenv("DISABLE_GEMINI_SAFETY", "false").lower() == "true"
+
+ # Debug logging for ElectronHub
+ if provider == "electronhub":
+ logger.debug(f"ElectronHub API call - Messages structure:")
+ for i, msg in enumerate(messages):
+ logger.debug(f" Message {i}: role='{msg.get('role')}', content_length={len(msg.get('content', ''))}")
+ if msg.get('role') == 'system':
+ logger.debug(f" System prompt preview: {msg.get('content', '')[:100]}...")
+
+ # Use OpenAI SDK for providers known to work well with it
+ sdk_compatible = ['deepseek', 'together', 'mistral', 'yi', 'qwen', 'moonshot', 'groq',
+ 'electronhub', 'openrouter', 'fireworks', 'xai', 'gemini-openai', 'chutes']
+
+ # Allow forcing HTTP-only for OpenRouter via toggle (default: enabled)
+ openrouter_http_only = os.getenv('OPENROUTER_USE_HTTP_ONLY', '0') == '1'
+ if provider == 'openrouter' and openrouter_http_only:
+ print("OpenRouter HTTP-only mode enabled — using direct HTTP client")
+
+ if openai and provider in sdk_compatible and not (provider == 'openrouter' and openrouter_http_only):
+ # Use OpenAI SDK with custom base URL
+ for attempt in range(max_retries):
+ try:
+ if self._cancelled:
+ raise UnifiedClientError("Operation cancelled")
+
+ client = self._get_openai_client(base_url=base_url, api_key=actual_api_key)
+
+ # Check if this is Gemini via OpenAI endpoint
+ is_gemini_endpoint = provider == "gemini-openai" or effective_model.lower().startswith('gemini')
+
+ # Get user-configured anti-duplicate parameters
+ anti_dupe_params = self._get_anti_duplicate_params(temperature)
+
+ # Enforce fixed temperature for o-series (e.g., GPT-5) to avoid 400s
+ req_temperature = temperature
+ try:
+ if self._is_o_series_model():
+ req_temperature = 1.0
+ except Exception:
+ pass
+
+ norm_max_tokens, norm_max_completion_tokens = self._normalize_token_params(max_tokens, None)
+ # Targeted preflight for OpenRouter free Gemma variant only
+ try:
+ if provider == 'openrouter':
+ ml = (effective_model or '').lower().strip()
+ if ml == 'google/gemma-3-27b-it:free' and any(isinstance(m, dict) and m.get('role') == 'system' for m in messages):
+ messages = self._merge_system_into_user(messages)
+ print("🔁 Preflight: merged system prompt into user for google/gemma-3-27b-it:free (SDK)")
+ try:
+ payload_name, _ = self._get_file_names(messages, context=getattr(self, 'context', 'translation'))
+ self._save_payload(messages, payload_name, retry_reason="preflight_gemma_no_system")
+ except Exception:
+ pass
+ except Exception:
+ pass
+
+ params = {
+ "model": effective_model,
+ "messages": messages,
+ "temperature": req_temperature,
+ **anti_dupe_params
+ }
+ if norm_max_completion_tokens is not None:
+ params["max_completion_tokens"] = norm_max_completion_tokens
+ elif norm_max_tokens is not None:
+ params["max_tokens"] = norm_max_tokens
+
+ # Use extra_body for provider-specific fields the SDK doesn't type-accept
+ extra_body = {}
+
+ # Inject OpenRouter reasoning configuration (effort/max_tokens) via extra_body
+ if provider == 'openrouter':
+ try:
+ enable_gpt = os.getenv('ENABLE_GPT_THINKING', '0') == '1'
+ if enable_gpt:
+ reasoning = {"enabled": True, "exclude": True}
+ tokens_str = (os.getenv('GPT_REASONING_TOKENS', '') or '').strip()
+ if tokens_str.isdigit() and int(tokens_str) > 0:
+ reasoning.pop('effort', None)
+ reasoning["max_tokens"] = int(tokens_str)
+ else:
+ effort = (os.getenv('GPT_EFFORT', 'medium') or 'medium').lower()
+ if effort not in ('low', 'medium', 'high'):
+ effort = 'medium'
+ reasoning.pop('max_tokens', None)
+ reasoning["effort"] = effort
+ extra_body["reasoning"] = reasoning
+ except Exception:
+ pass
+
+ # Add safety parameters for providers that support them
+ # Note: Together AI doesn't support the 'moderation' parameter
+ if disable_safety and provider in ["groq", "fireworks"]:
+ extra_body["moderation"] = False
+ logger.info(f"🔓 Safety moderation disabled for {provider}")
+ elif disable_safety and provider == "together":
+ # Together AI handles safety differently - no moderation parameter
+ logger.info(f"🔓 Safety settings note: {provider} doesn't support moderation parameter")
+
+ # Use Idempotency-Key header to avoid unsupported kwarg on some endpoints
+ idem_key = self._get_idempotency_key()
+ extra_headers = {"Idempotency-Key": idem_key}
+ if provider == 'openrouter':
+ # OpenRouter requires Referer and Title; also request JSON explicitly
+ extra_headers.update({
+ "HTTP-Referer": os.getenv('OPENROUTER_REFERER', 'https://github.com/Shirochi-stack/Glossarion'),
+ "X-Title": os.getenv('OPENROUTER_APP_NAME', 'Glossarion Translation'),
+ "X-Proxy-TTL": "0",
+ "Accept": "application/json",
+ "Cache-Control": "no-cache",
+ })
+ if os.getenv('OPENROUTER_ACCEPT_IDENTITY', '0') == '1':
+ extra_headers["Accept-Encoding"] = "identity"
+
+ # Build call kwargs and include extra_body only when present
+ call_kwargs = {
+ **params,
+ "extra_headers": extra_headers,
+ }
+ if extra_body:
+ call_kwargs["extra_body"] = extra_body
+
+ resp = client.chat.completions.create(**call_kwargs)
+
+ # Enhanced extraction for Gemini endpoints
+ content = None
+ finish_reason = 'stop'
+
+ # Extract content with Gemini awareness
+ if hasattr(resp, 'choices') and resp.choices:
+ choice = resp.choices[0]
+
+ if hasattr(choice, 'finish_reason'):
+ finish_reason = choice.finish_reason or 'stop'
+
+ if hasattr(choice, 'message'):
+ message = choice.message
+ if message is None:
+ content = ""
+ if is_gemini_endpoint:
+ content = "[GEMINI RETURNED NULL MESSAGE]"
+ finish_reason = 'content_filter'
+ elif hasattr(message, 'content'):
+ content = message.content or ""
+ if content is None and is_gemini_endpoint:
+ content = "[BLOCKED BY GEMINI SAFETY FILTER]"
+ finish_reason = 'content_filter'
+ elif hasattr(message, 'text'):
+ content = message.text
+ elif isinstance(message, str):
+ content = message
+ else:
+ content = str(message) if message else ""
+ elif hasattr(choice, 'text'):
+ content = choice.text
+ else:
+ content = ""
+ else:
+ content = ""
+
+ # Normalize finish reasons
+ if finish_reason in ["max_tokens", "max_length"]:
+ finish_reason = "length"
+
+ usage = None
+ if hasattr(resp, 'usage'):
+ usage = {
+ 'prompt_tokens': resp.usage.prompt_tokens,
+ 'completion_tokens': resp.usage.completion_tokens,
+ 'total_tokens': resp.usage.total_tokens
+ }
+
+ self._save_response(content, response_name)
+
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=resp
+ )
+
+ except Exception as e:
+ error_str = str(e).lower()
+ if "rate limit" in error_str or "429" in error_str or "quota" in error_str:
+ # Preserve the full error message from OpenRouter/ElectronHub
+ raise UnifiedClientError(str(e), error_type="rate_limit")
+ # Fallback: If SDK has trouble parsing OpenRouter response, retry via direct HTTP with full diagnostics
+ if provider == 'openrouter' and ("expecting value" in error_str or "json" in error_str):
+ try:
+ print("OpenRouter SDK parse error — falling back to HTTP path for this attempt")
+ # Save the SDK parse error to failed_requests with traceback
+ try:
+ self._save_failed_request(messages, e, getattr(self, 'context', 'general'))
+ except Exception:
+ pass
+ # Build headers
+ http_headers = self._build_openai_headers(provider, actual_api_key, headers)
+ http_headers['HTTP-Referer'] = os.getenv('OPENROUTER_REFERER', 'https://github.com/Shirochi-stack/Glossarion')
+ http_headers['X-Title'] = os.getenv('OPENROUTER_APP_NAME', 'Glossarion Translation')
+ http_headers['X-Proxy-TTL'] = '0'
+ http_headers['Accept'] = 'application/json'
+ http_headers['Cache-Control'] = 'no-cache'
+ if os.getenv('OPENROUTER_ACCEPT_IDENTITY', '0') == '1':
+ http_headers['Accept-Encoding'] = 'identity'
+ # Build body similar to HTTP branch
+ norm_max_tokens, norm_max_completion_tokens = self._normalize_token_params(max_tokens, None)
+ body = {
+ "model": effective_model,
+ "messages": messages,
+ "temperature": req_temperature,
+ }
+ if norm_max_completion_tokens is not None:
+ body["max_completion_tokens"] = norm_max_completion_tokens
+ elif norm_max_tokens is not None:
+ body["max_tokens"] = norm_max_tokens
+ # Reasoning (OpenRouter-only)
+ try:
+ enable_gpt = os.getenv('ENABLE_GPT_THINKING', '0') == '1'
+ if enable_gpt:
+ reasoning = {"enabled": True, "exclude": True}
+ tokens_str = (os.getenv('GPT_REASONING_TOKENS', '') or '').strip()
+ if tokens_str.isdigit() and int(tokens_str) > 0:
+ reasoning["max_tokens"] = int(tokens_str)
+ else:
+ effort = (os.getenv('GPT_EFFORT', 'medium') or 'medium').lower()
+ if effort not in ('low', 'medium', 'high'):
+ effort = 'medium'
+ reasoning["effort"] = effort
+ body["reasoning"] = reasoning
+ except Exception:
+ pass
+ # Make HTTP request
+ endpoint = "/chat/completions"
+ http_headers["Idempotency-Key"] = self._get_idempotency_key()
+ resp = self._http_request_with_retries(
+ method="POST",
+ url=f"{base_url}{endpoint}",
+ headers=http_headers,
+ json=body,
+ expected_status=(200,),
+ max_retries=1,
+ provider_name="OpenRouter (HTTP)",
+ use_session=True
+ )
+ json_resp = resp.json()
+ choices = json_resp.get("choices", [])
+ if not choices:
+ raise UnifiedClientError("OpenRouter (HTTP) returned no choices")
+ content, finish_reason, usage = self._extract_openai_json(json_resp)
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=json_resp
+ )
+ except Exception as http_e:
+ # Surface detailed diagnostics
+ raise UnifiedClientError(
+ f"OpenRouter HTTP fallback failed: {http_e}",
+ error_type="parse_error"
+ )
+ if not self._multi_key_mode and attempt < max_retries - 1:
+ print(f"{provider} SDK error (attempt {attempt + 1}): {e}")
+ time.sleep(api_delay)
+ continue
+ elif self._multi_key_mode:
+ raise UnifiedClientError(f"{provider} error: {e}", error_type="api_error")
+ raise UnifiedClientError(f"{provider} SDK error: {e}")
+ else:
+ # Use HTTP API with retry logic
+ headers = self._build_openai_headers(provider, actual_api_key, headers)
+ # Provider-specific header tweaks
+ if provider == 'openrouter':
+ headers['HTTP-Referer'] = os.getenv('OPENROUTER_REFERER', 'https://github.com/Shirochi-stack/Glossarion')
+ headers['X-Title'] = os.getenv('OPENROUTER_APP_NAME', 'Glossarion Translation')
+ headers['X-Proxy-TTL'] = '0'
+ headers['Cache-Control'] = 'no-cache'
+ if os.getenv('OPENROUTER_ACCEPT_IDENTITY', '0') == '1':
+ headers['Accept-Encoding'] = 'identity'
+ elif provider == 'zhipu':
+ headers["Authorization"] = f"Bearer {actual_api_key}"
+ elif provider == 'baidu':
+ headers["Content-Type"] = "application/json"
+ # Normalize token parameter (o-series: max_completion_tokens; others: max_tokens)
+ norm_max_tokens, norm_max_completion_tokens = self._normalize_token_params(max_tokens, None)
+
+ # Enforce fixed temperature for o-series (e.g., GPT-5) to avoid 400s
+ req_temperature = temperature
+ try:
+ if provider == 'openai' and self._is_o_series_model():
+ req_temperature = 1.0
+ except Exception:
+ pass
+
+ # Targeted preflight for OpenRouter free Gemma variant only
+ try:
+ if provider == 'openrouter':
+ ml = (effective_model or '').lower().strip()
+ if ml == 'google/gemma-3-27b-it:free' and any(isinstance(m, dict) and m.get('role') == 'system' for m in messages):
+ messages = self._merge_system_into_user(messages)
+ print("🔁 Preflight (HTTP): merged system prompt into user for google/gemma-3-27b-it:free")
+ try:
+ payload_name, _ = self._get_file_names(messages, context=getattr(self, 'context', 'translation'))
+ self._save_payload(messages, payload_name, retry_reason="preflight_gemma_no_system")
+ except Exception:
+ pass
+ except Exception:
+ pass
+
+ data = {
+ "model": effective_model,
+ "messages": messages,
+ "temperature": req_temperature,
+ }
+ if norm_max_completion_tokens is not None:
+ data["max_completion_tokens"] = norm_max_completion_tokens
+ elif norm_max_tokens is not None:
+ data["max_tokens"] = norm_max_tokens
+
+ # Inject OpenRouter reasoning configuration (effort/max_tokens)
+ if provider == 'openrouter':
+ try:
+ enable_gpt = os.getenv('ENABLE_GPT_THINKING', '0') == '1'
+ if enable_gpt:
+ reasoning = {"enabled": True, "exclude": True}
+ tokens_str = (os.getenv('GPT_REASONING_TOKENS', '') or '').strip()
+ if tokens_str.isdigit() and int(tokens_str) > 0:
+ reasoning.pop('effort', None)
+ reasoning["max_tokens"] = int(tokens_str)
+ else:
+ effort = (os.getenv('GPT_EFFORT', 'medium') or 'medium').lower()
+ if effort not in ('low', 'medium', 'high'):
+ effort = 'medium'
+ reasoning.pop('max_tokens', None)
+ reasoning["effort"] = effort
+ data["reasoning"] = reasoning
+ except Exception:
+ pass
+
+ # Add Perplexity-specific options for Sonar models
+ if provider == 'perplexity' and 'sonar' in effective_model.lower():
+ data['search_domain_filter'] = ['perplexity.ai']
+ data['return_citations'] = True
+ data['search_recency_filter'] = 'month'
+
+ # Apply safety flags
+ self._apply_openai_safety(provider, disable_safety, data, headers)
+ # Save OpenRouter config if requested
+ if provider == 'openrouter' and os.getenv("SAVE_PAYLOAD", "1") == "1":
+ cfg = {
+ "provider": "openrouter",
+ "timestamp": datetime.now().isoformat(),
+ "model": effective_model,
+ "safety_disabled": disable_safety,
+ "temperature": temperature,
+ "max_tokens": max_tokens
+ }
+ # Persist reasoning config in saved debug file
+ try:
+ enable_gpt = os.getenv('ENABLE_GPT_THINKING', '0') == '1'
+ if enable_gpt:
+ reasoning = {"enabled": True, "exclude": True}
+ tokens_str = (os.getenv('GPT_REASONING_TOKENS', '') or '').strip()
+ if tokens_str.isdigit() and int(tokens_str) > 0:
+ reasoning.pop('effort', None)
+ reasoning["max_tokens"] = int(tokens_str)
+ else:
+ effort = (os.getenv('GPT_EFFORT', 'medium') or 'medium').lower()
+ if effort not in ('low', 'medium', 'high'):
+ effort = 'medium'
+ reasoning.pop('max_tokens', None)
+ reasoning["effort"] = effort
+ cfg["reasoning"] = reasoning
+ except Exception:
+ pass
+ self._save_openrouter_config(cfg, response_name)
+ # Endpoint and idempotency
+ endpoint = "/chat/completions"
+ headers["Idempotency-Key"] = self._get_idempotency_key()
+ resp = self._http_request_with_retries(
+ method="POST",
+ url=f"{base_url}{endpoint}",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=max_retries,
+ provider_name=provider,
+ use_session=True
+ )
+ # Safely parse JSON with diagnostics for non-JSON bodies
+ try:
+ ct = (resp.headers.get('content-type') or '').lower()
+ if 'application/json' not in ct:
+ snippet = resp.text[:1200] if hasattr(resp, 'text') else ''
+ # Log failed request snapshot
+ try:
+ self._save_failed_request(messages, f"non-JSON content-type: {ct}", getattr(self, 'context', 'general'), response=snippet)
+ except Exception:
+ pass
+ raise UnifiedClientError(
+ f"{provider} returned non-JSON content-type: {ct or 'unknown'} | snippet: {snippet}",
+ error_type="parse_error",
+ http_status=resp.status_code,
+ details={"content_type": ct, "snippet": snippet}
+ )
+ json_resp = resp.json()
+ except Exception as je:
+ # If this is a JSON decode error, surface a helpful message
+ import json as _json
+ if isinstance(je, UnifiedClientError):
+ raise
+ try:
+ # detect common JSON decode exceptions without importing vendor-specific types
+ if 'Expecting value' in str(je) or 'JSONDecodeError' in str(type(je)):
+ snippet = resp.text[:1200] if hasattr(resp, 'text') else ''
+ try:
+ self._save_failed_request(messages, f"json-parse-failed: {je}", getattr(self, 'context', 'general'), response=snippet)
+ except Exception:
+ pass
+ raise UnifiedClientError(
+ f"{provider} JSON parse failed: {je} | snippet: {snippet}",
+ error_type="parse_error",
+ http_status=resp.status_code,
+ details={"content_type": ct, "snippet": snippet}
+ )
+ except Exception:
+ pass
+ # Re-raise unknown parsing exceptions
+ raise
+
+ choices = json_resp.get("choices", [])
+ if not choices:
+ raise UnifiedClientError(f"{provider} API returned no choices")
+ content, finish_reason, usage = self._extract_openai_json(json_resp)
+ # ElectronHub truncation detection
+ if provider == "electronhub" and content:
+ if len(content) < 50 and "cannot" in content.lower():
+ finish_reason = "content_filter"
+ print(f"ElectronHub likely refused content: {content[:100]}")
+ elif finish_reason == "stop":
+ if self._detect_silent_truncation(content, messages, self.context):
+ finish_reason = "length"
+ print("ElectronHub reported 'stop' but content appears truncated")
+ print(f"🔍 ElectronHub: Detected silent truncation despite 'stop' status")
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=json_resp
+ )
+
+ def _send_openai(self, messages, temperature, max_tokens, max_completion_tokens, response_name) -> UnifiedResponse:
+ """Send request to OpenAI API with proper token parameter handling"""
+ # CRITICAL: Check if individual endpoint is applied first
+ if (hasattr(self, '_individual_endpoint_applied') and self._individual_endpoint_applied and
+ hasattr(self, 'openai_client') and self.openai_client):
+ individual_base_url = getattr(self.openai_client, 'base_url', None)
+ if individual_base_url:
+ base_url = str(individual_base_url).rstrip('/')
+ else:
+ base_url = 'https://api.openai.com/v1'
+ else:
+ # Fallback to global custom endpoint logic
+ custom_base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', '')
+ use_custom_endpoint = os.getenv('USE_CUSTOM_OPENAI_ENDPOINT', '0') == '1'
+
+ if custom_base_url and use_custom_endpoint:
+ base_url = custom_base_url
+ else:
+ base_url = 'https://api.openai.com/v1'
+
+ # For OpenAI, we need to handle max_completion_tokens properly
+ return self._send_openai_compatible(
+ messages=messages,
+ temperature=temperature,
+ max_tokens=max_tokens if not max_completion_tokens else max_completion_tokens,
+ base_url=base_url,
+ response_name=response_name,
+ provider="openai"
+ )
+
+ def _send_openai_provider_router(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Generic router for many OpenAI-compatible providers to reduce wrapper duplication."""
+ provider = self._get_actual_provider()
+
+ # Provider URL mapping dictionary
+ provider_urls = {
+ 'yi': lambda: os.getenv("YI_API_BASE_URL", "https://api.01.ai/v1"),
+ 'qwen': "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
+ 'baichuan': "https://api.baichuan-ai.com/v1",
+ 'zhipu': "https://open.bigmodel.cn/api/paas/v4",
+ 'moonshot': "https://api.moonshot.cn/v1",
+ 'groq': lambda: os.getenv("GROQ_API_URL", "https://api.groq.com/openai/v1"),
+ 'baidu': "https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop",
+ 'tencent': "https://hunyuan.cloud.tencent.com/v1",
+ 'iflytek': "https://spark-api.xf-yun.com/v1",
+ 'bytedance': "https://maas-api.vercel.app/v1",
+ 'minimax': "https://api.minimax.chat/v1",
+ 'sensenova': "https://api.sensenova.cn/v1",
+ 'internlm': "https://api.internlm.org/v1",
+ 'tii': "https://api.tii.ae/v1",
+ 'microsoft': "https://api.microsoft.com/v1",
+ 'databricks': lambda: f"{os.getenv('DATABRICKS_API_URL', 'https://YOUR-WORKSPACE.databricks.com')}/serving/endpoints",
+ 'together': "https://api.together.xyz/v1",
+ 'openrouter': "https://openrouter.ai/api/v1",
+ 'fireworks': lambda: os.getenv("FIREWORKS_API_URL", "https://api.fireworks.ai/inference/v1"),
+ 'xai': lambda: os.getenv("XAI_API_URL", "https://api.x.ai/v1"),
+ 'deepseek': lambda: os.getenv("DEEPSEEK_API_URL", "https://api.deepseek.com/v1"),
+ 'perplexity': "https://api.perplexity.ai",
+ 'chutes': lambda: os.getenv("CHUTES_API_URL", "https://llm.chutes.ai/v1"),
+ 'salesforce': lambda: os.getenv("SALESFORCE_API_URL", "https://api.salesforce.com/v1"),
+ 'bigscience': "https://api.together.xyz/v1", # Together AI fallback
+ 'meta': "https://api.together.xyz/v1" # Together AI fallback
+ }
+
+ # Get base URL from mapping
+ url_spec = provider_urls.get(provider)
+ if url_spec:
+ base_url = url_spec() if callable(url_spec) else url_spec
+ else:
+ # Fallback to base OpenAI-compatible flow if unknown
+ base_url = os.getenv('OPENAI_CUSTOM_BASE_URL', 'https://api.openai.com/v1')
+
+ return self._send_openai_compatible(
+ messages=messages,
+ temperature=temperature,
+ max_tokens=max_tokens,
+ base_url=base_url,
+ response_name=response_name,
+ provider=provider
+ )
+
+ def _send_azure(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Azure OpenAI"""
+ # Prefer per-key (individual) endpoint/version when present, then fall back to env vars
+ endpoint = getattr(self, 'azure_endpoint', None) or \
+ getattr(self, 'current_key_azure_endpoint', None) or \
+ os.getenv("AZURE_OPENAI_ENDPOINT", "https://YOUR-RESOURCE.openai.azure.com")
+ api_version = getattr(self, 'azure_api_version', None) or \
+ getattr(self, 'current_key_azure_api_version', None) or \
+ os.getenv("AZURE_API_VERSION", "2024-02-01")
+
+ if endpoint and not endpoint.startswith(("http://", "https://")):
+ endpoint = "https://" + endpoint
+
+ headers = {
+ "api-key": self.api_key,
+ "Content-Type": "application/json"
+ }
+
+ # Azure uses a different URL structure
+ base_url = f"{endpoint.rstrip('/')}/openai/deployments/{self.model}"
+ url = f"{base_url}/chat/completions?api-version={api_version}"
+
+ data = {
+ "messages": messages,
+ "temperature": temperature
+ }
+
+ # Use _is_o_series_model to determine which token parameter to use
+ if self._is_o_series_model():
+ data["max_completion_tokens"] = max_tokens
+ else:
+ data["max_tokens"] = max_tokens
+
+ try:
+ resp = requests.post(
+ url,
+ headers=headers,
+ json=data,
+ timeout=self.request_timeout
+ )
+
+ if resp.status_code != 200:
+ # Treat all 400s as prohibited_content to trigger fallback keys cleanly
+ if resp.status_code == 400:
+ raise UnifiedClientError(
+ f"Azure OpenAI error: {resp.status_code} - {resp.text}",
+ error_type="prohibited_content",
+ http_status=400
+ )
+ # Other errors propagate normally with status code
+ raise UnifiedClientError(
+ f"Azure OpenAI error: {resp.status_code} - {resp.text}",
+ http_status=resp.status_code
+ )
+
+ json_resp = resp.json()
+ content, finish_reason, usage = self._extract_openai_json(json_resp)
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=json_resp
+ )
+
+ except Exception as e:
+ print(f"Azure OpenAI error: {e}")
+ raise UnifiedClientError(f"Azure OpenAI error: {e}")
+
+ def _send_google_palm(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Google PaLM API"""
+ # PaLM is being replaced by Gemini, but included for completeness
+ return self._send_gemini(messages, temperature, max_tokens, response_name)
+
+ def _send_alephalpha(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to Aleph Alpha API"""
+ headers = {
+ "Authorization": f"Bearer {self.api_key}",
+ "Content-Type": "application/json"
+ }
+
+ # Format messages for Aleph Alpha (simple concatenation)
+ prompt = self._format_prompt(messages, style='replicate')
+
+ data = {
+ "model": self.model,
+ "prompt": prompt,
+ "maximum_tokens": max_tokens,
+ "temperature": temperature
+ }
+
+ try:
+ resp = self._http_request_with_retries(
+ method="POST",
+ url="https://api.aleph-alpha.com/complete",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=3,
+ provider_name="AlephAlpha"
+ )
+ json_resp = resp.json()
+ content = json_resp.get('completions', [{}])[0].get('completion', '')
+
+ return UnifiedResponse(
+ content=content,
+ finish_reason='stop',
+ raw_response=json_resp
+ )
+
+ except Exception as e:
+ print(f"Aleph Alpha error: {e}")
+ raise UnifiedClientError(f"Aleph Alpha error: {e}")
+
+ def _send_huggingface(self, messages, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send request to HuggingFace Inference API"""
+ headers = {
+ "Authorization": f"Bearer {self.api_key}",
+ "Content-Type": "application/json"
+ }
+
+ # Format messages for HuggingFace (simple concatenation)
+ prompt = self._format_prompt(messages, style='replicate')
+
+ data = {
+ "inputs": prompt,
+ "parameters": {
+ "max_new_tokens": max_tokens,
+ "temperature": temperature,
+ "return_full_text": False
+ }
+ }
+
+ try:
+ resp = self._http_request_with_retries(
+ method="POST",
+ url=f"https://api-inference.huggingface.co/models/{self.model}",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=3,
+ provider_name="HuggingFace"
+ )
+ json_resp = resp.json()
+ content = ""
+ if isinstance(json_resp, list) and json_resp:
+ content = json_resp[0].get('generated_text', '')
+
+ return UnifiedResponse(
+ content=content,
+ finish_reason='stop',
+ raw_response=json_resp
+ )
+
+ except Exception as e:
+ print(f"HuggingFace error: {e}")
+ raise UnifiedClientError(f"HuggingFace error: {e}")
+
+ def _send_vertex_model_garden_image(self, messages, image_base64, temperature, max_tokens, response_name):
+ """Send image request to Vertex AI Model Garden"""
+ # For now, we can just call the regular send method since Vertex AI
+ # handles images in the message format
+
+ # Convert image to message format that Vertex AI expects
+ image_message = {
+ "role": "user",
+ "content": [
+ {"type": "text", "text": messages[-1]['content'] if messages else ""},
+ {"type": "image", "image": {"base64": image_base64}}
+ ]
+ }
+
+ # Replace last message with image message
+ messages_with_image = messages[:-1] + [image_message]
+
+ # Use the regular Vertex AI send method
+ return self._send_vertex_model_garden(messages_with_image, temperature, max_tokens, response_name=response_name)
+
+ def _is_o_series_model(self) -> bool:
+ """Check if the current model is an o-series model (o1, o3, o4, etc.) or GPT-5"""
+ if not self.model:
+ return False
+
+ model_lower = self.model.lower()
+
+ # Check for specific patterns
+ if 'o1-preview' in model_lower or 'o1-mini' in model_lower:
+ return True
+
+ # Check for o3 models
+ if 'o3-mini' in model_lower or 'o3-pro' in model_lower:
+ return True
+
+ # Check for o4 models
+ if 'o4-mini' in model_lower:
+ return True
+
+ # Check for GPT-5 models (including variants)
+ if 'gpt-5' in model_lower or 'gpt5' in model_lower:
+ return True
+
+ # Check if it starts with o followed by a digit
+ if len(model_lower) >= 2 and model_lower[0] == 'o' and model_lower[1].isdigit():
+ return True
+
+ return False
+
+ def _prepare_gemini_image_content(self, messages, image_base64):
+ """Prepare image content for Gemini API - supports both single and multiple images"""
+
+ # Check if this is a multi-image request (messages contain content arrays)
+ is_multi_image = False
+ for msg in messages:
+ if isinstance(msg.get('content'), list):
+ for part in msg['content']:
+ if part.get('type') == 'image_url':
+ is_multi_image = True
+ break
+
+ if is_multi_image:
+ # Handle multi-image format
+ contents = []
+
+ for msg in messages:
+ if msg['role'] == 'system':
+ contents.append({
+ "role": "user",
+ "parts": [{"text": f"Instructions: {msg['content']}"}]
+ })
+ elif msg['role'] == 'user':
+ if isinstance(msg['content'], str):
+ contents.append({
+ "role": "user",
+ "parts": [{"text": msg['content']}]
+ })
+ elif isinstance(msg['content'], list):
+ parts = []
+ for part in msg['content']:
+ if part['type'] == 'text':
+ parts.append({"text": part['text']})
+ elif part['type'] == 'image_url':
+ image_data = part['image_url']['url']
+ if image_data.startswith('data:'):
+ base64_data = image_data.split(',')[1]
+ else:
+ base64_data = image_data
+
+ mime_type = "image/png"
+ if 'jpeg' in image_data or 'jpg' in image_data:
+ mime_type = "image/jpeg"
+ elif 'webp' in image_data:
+ mime_type = "image/webp"
+
+ parts.append({
+ "inline_data": {
+ "mime_type": mime_type,
+ "data": base64_data
+ }
+ })
+
+ contents.append({
+ "role": "user",
+ "parts": parts
+ })
+ else:
+ # Handle single image format (backward compatibility)
+ formatted_parts = []
+ for msg in messages:
+ if msg.get('role') == 'system':
+ formatted_parts.append(f"Instructions: {msg['content']}")
+ elif msg.get('role') == 'user':
+ formatted_parts.append(f"User: {msg['content']}")
+
+ text_prompt = "\n\n".join(formatted_parts)
+
+ contents = [
+ {
+ "role": "user",
+ "parts": [
+ {"text": text_prompt},
+ {"inline_data": {
+ "mime_type": "image/jpeg",
+ "data": image_base64
+ }}
+ ]
+ }
+ ]
+
+ return contents
+
+ # Removed: _send_openai_image
+ # OpenAI-compatible providers handle images within messages via _get_response and _send_openai_compatible
+
+ def _send_anthropic_image(self, messages, image_base64, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send image request to Anthropic API"""
+ headers = {
+ "X-API-Key": self.api_key,
+ "Content-Type": "application/json",
+ "anthropic-version": "2023-06-01"
+ }
+
+ # Format messages with image
+ system_message = None
+ formatted_messages = []
+
+ for msg in messages:
+ if msg['role'] == 'system':
+ system_message = msg['content']
+ elif msg['role'] == 'user':
+ # Add image to user message
+ formatted_messages.append({
+ "role": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": msg['content']
+ },
+ {
+ "type": "image",
+ "source": {
+ "type": "base64",
+ "media_type": "image/jpeg",
+ "data": image_base64
+ }
+ }
+ ]
+ })
+ else:
+ formatted_messages.append({
+ "role": msg['role'],
+ "content": msg['content']
+ })
+
+ data = {
+ "model": self.model,
+ "messages": formatted_messages,
+ "temperature": temperature,
+ "max_tokens": max_tokens
+ }
+
+ # Get user-configured anti-duplicate parameters
+ anti_dupe_params = self._get_anti_duplicate_params(temperature)
+ data.update(anti_dupe_params) # Add user's custom parameters
+
+ if system_message:
+ data["system"] = system_message
+
+ try:
+ resp = self._http_request_with_retries(
+ method="POST",
+ url="https://api.anthropic.com/v1/messages",
+ headers=headers,
+ json=data,
+ expected_status=(200,),
+ max_retries=3,
+ provider_name="Anthropic Image"
+ )
+ json_resp = resp.json()
+ content, finish_reason, usage = self._parse_anthropic_json(json_resp)
+ return UnifiedResponse(
+ content=content,
+ finish_reason=finish_reason,
+ usage=usage,
+ raw_response=json_resp
+ )
+
+ except Exception as e:
+ print(f"Anthropic Vision API error: {e}")
+ raise UnifiedClientError(f"Anthropic Vision API error: {e}")
+
+ # Removed: _send_electronhub_image (handled via _send_openai_compatible in _get_response)
+
+ def _send_poe_image(self, messages, image_base64, temperature, max_tokens, response_name) -> UnifiedResponse:
+ """Send image request using poe-api-wrapper"""
+ try:
+ from poe_api_wrapper import PoeApi
+ except ImportError:
+ raise UnifiedClientError(
+ "poe-api-wrapper not installed. Run: pip install poe-api-wrapper"
+ )
+
+ # Parse cookies using robust parser
+ tokens = self._parse_poe_tokens(self.api_key)
+ if 'p-b' not in tokens or not tokens['p-b']:
+ raise UnifiedClientError(
+ "POE tokens missing. Provide cookies as 'p-b:VALUE|p-lat:VALUE' or 'p-b=VALUE; p-lat=VALUE'",
+ error_type="auth_error"
+ )
+ if 'p-lat' not in tokens:
+ tokens['p-lat'] = ''
+ logger.info("No p-lat cookie provided; proceeding without it")
+
+ logger.info(f"Tokens being sent for image: p-b={len(tokens.get('p-b', ''))} chars, p-lat={len(tokens.get('p-lat', ''))} chars")
+
+ try:
+ # Create Poe client (try to pass proxy/headers if supported)
+ poe_kwargs = {}
+ ua = os.getenv("POE_USER_AGENT") or os.getenv("HTTP_USER_AGENT")
+ if ua:
+ poe_kwargs["headers"] = {"User-Agent": ua, "Referer": "https://poe.com/", "Origin": "https://poe.com"}
+ proxy = os.getenv("POE_PROXY") or os.getenv("HTTPS_PROXY") or os.getenv("HTTP_PROXY")
+ if proxy:
+ poe_kwargs["proxy"] = proxy
+ try:
+ poe_client = PoeApi(tokens=tokens, **poe_kwargs)
+ except TypeError:
+ poe_client = PoeApi(tokens=tokens)
+ try:
+ if ua and hasattr(poe_client, "session") and hasattr(poe_client.session, "headers"):
+ poe_client.session.headers.update({"User-Agent": ua, "Referer": "https://poe.com/", "Origin": "https://poe.com"})
+ except Exception:
+ pass
+
+ # Get bot name - use vision-capable bots
+ requested_model = self.model.replace('poe/', '', 1)
+ bot_map = {
+ # Vision-capable models
+ 'gpt-4-vision': 'GPT-4V',
+ 'gpt-4v': 'GPT-4V',
+ 'claude-3-opus': 'claude_3_opus', # Claude 3 models support vision
+ 'claude-3-sonnet': 'claude_3_sonnet',
+ 'claude-3-haiku': 'claude_3_haiku',
+ 'gemini-pro-vision': 'gemini_pro_vision',
+ 'gemini-2.5-flash': 'gemini_1_5_flash', # Gemini 1.5 supports vision
+ 'gemini-2.5-pro': 'gemini_1_5_pro',
+
+ # Fallback to regular models
+ 'gpt-4': 'beaver',
+ 'claude': 'a2',
+ 'assistant': 'assistant',
+ }
+ bot_name = bot_map.get(requested_model.lower(), requested_model)
+ logger.info(f"Using bot name for vision: {bot_name}")
+
+ # Convert messages to prompt
+ prompt = self._format_prompt(messages, style='replicate')
+
+ # Note: poe-api-wrapper's image support varies by version
+ # Some versions support file_path parameter, others need different approaches
+ full_response = ""
+
+ # POE file_path support is inconsistent; fall back to plain prompt
+ for chunk in poe_client.send_message(bot_name, prompt):
+ if 'response' in chunk:
+ full_response = chunk['response']
+
+ except Exception as img_error:
+ print(f"Image handling error: {img_error}")
+ # Fall back to text-only message
+ print("Falling back to text-only message due to image error")
+ for chunk in poe_client.send_message(bot_name, prompt):
+ if 'response' in chunk:
+ full_response = chunk['response']
+
+ # Get the final text
+ final_text = chunk.get('text', full_response) if 'chunk' in locals() else full_response
+
+ if not final_text:
+ raise UnifiedClientError(
+ "POE returned empty response for image. "
+ "The bot may not support image inputs or the image format is unsupported."
+ )
+
+ return UnifiedResponse(
+ content=final_text,
+ finish_reason="stop",
+ raw_response=chunk if 'chunk' in locals() else {"response": full_response}
+ )
+
+ except Exception as e:
+ print(f"Poe image API error details: {str(e)}")
+ error_str = str(e).lower()
+
+ if self._is_rate_limit_error(e):
+ raise UnifiedClientError(
+ "POE rate limit exceeded. Please wait before trying again.",
+ error_type="rate_limit"
+ )
+ elif "auth" in error_str or "unauthorized" in error_str:
+ raise UnifiedClientError(
+ "POE authentication failed. Your cookies may be expired.",
+ error_type="auth_error"
+ )
+ elif "not support" in error_str or "vision" in error_str:
+ raise UnifiedClientError(
+ f"The selected POE bot '{requested_model}' may not support image inputs. "
+ "Try using a vision-capable model like gpt-4-vision or claude-3-opus.",
+ error_type="capability_error"
+ )
+
+ raise UnifiedClientError(f"Poe image API error: {e}")
+
+ def _log_truncation_failure(self, messages, response_content, finish_reason, context=None, attempts=None, error_details=None):
+ """Log truncation failures for analysis - saves to CSV, TXT, and HTML in truncation_logs subfolder"""
+ try:
+ # Use output directory if provided, otherwise current directory
+ base_dir = self.output_dir if self.output_dir else "."
+
+ # Create truncation_logs subfolder inside the output directory
+ log_dir = os.path.join(base_dir, "truncation_logs")
+ os.makedirs(log_dir, exist_ok=True)
+
+ # Generate log filename with date
+ log_date = datetime.now().strftime("%Y%m")
+
+ # CSV log file (keeping for compatibility)
+ csv_log_file = os.path.join(log_dir, f"truncation_failures_{log_date}.csv")
+
+ # TXT log file (human-readable format)
+ txt_log_file = os.path.join(log_dir, f"truncation_failures_{log_date}.txt")
+
+ # HTML log file (web-viewable format)
+ html_log_file = os.path.join(log_dir, f"truncation_failures_{log_date}.html")
+
+ # Summary file to track truncated outputs
+ summary_file = os.path.join(log_dir, f"truncation_summary_{log_date}.json")
+
+ # Check if CSV file exists to determine if we need headers
+ csv_file_exists = os.path.exists(csv_log_file)
+
+ # Extract output filename - UPDATED LOGIC
+ output_filename = 'unknown'
+
+ # PRIORITY 1: Use the actual output filename if set via set_output_filename()
+ if hasattr(self, '_actual_output_filename') and self._actual_output_filename:
+ output_filename = self._actual_output_filename
+ # PRIORITY 2: Use current output file if available
+ elif hasattr(self, '_current_output_file') and self._current_output_file:
+ output_filename = self._current_output_file
+ # PRIORITY 3: Use tracked response filename from _save_response
+ elif hasattr(self, '_last_response_filename') and self._last_response_filename:
+ # Skip if it's a generic Payloads filename
+ if not self._last_response_filename.startswith(('response_', 'translation_')):
+ output_filename = self._last_response_filename
+
+ # FALLBACK: Try to extract from context/messages if no filename was set
+ if output_filename == 'unknown':
+ if context == 'translation':
+ # Try to extract chapter/response filename
+ chapter_match = re.search(r'Chapter (\d+)', str(messages))
+ if chapter_match:
+ chapter_num = chapter_match.group(1)
+ # Use the standard format that matches book output
+ safe_title = f"Chapter_{chapter_num}"
+ output_filename = f"response_{chapter_num.zfill(3)}_{safe_title}.html"
+ else:
+ # Try chunk pattern
+ chunk_match = re.search(r'Chunk (\d+)/(\d+).*Chapter (\d+)', str(messages))
+ if chunk_match:
+ chunk_num = chunk_match.group(1)
+ chapter_num = chunk_match.group(3)
+ safe_title = f"Chapter_{chapter_num}"
+ output_filename = f"response_{chapter_num.zfill(3)}_{safe_title}_chunk_{chunk_num}.html"
+ elif context == 'image_translation':
+ # Extract image filename if available
+ img_match = re.search(r'([\w\-]+\.(jpg|jpeg|png|gif|webp))', str(messages), re.IGNORECASE)
+ if img_match:
+ output_filename = f"image_{img_match.group(1)}"
+
+ # Load or create summary tracking
+ summary_data = {"truncated_files": set(), "total_truncations": 0, "by_type": {}}
+ if os.path.exists(summary_file):
+ try:
+ with open(summary_file, 'r', encoding='utf-8') as f:
+ loaded_data = json.load(f)
+ summary_data["truncated_files"] = set(loaded_data.get("truncated_files", []))
+ summary_data["total_truncations"] = loaded_data.get("total_truncations", 0)
+ summary_data["by_type"] = loaded_data.get("by_type", {})
+ except:
+ pass
+
+ # Update summary
+ summary_data["truncated_files"].add(output_filename)
+ summary_data["total_truncations"] += 1
+ truncation_type_key = f"{finish_reason}_{context or 'unknown'}"
+ summary_data["by_type"][truncation_type_key] = summary_data["by_type"].get(truncation_type_key, 0) + 1
+
+ # Save summary
+ save_summary = {
+ "truncated_files": sorted(list(summary_data["truncated_files"])),
+ "total_truncations": summary_data["total_truncations"],
+ "by_type": summary_data["by_type"],
+ "last_updated": datetime.now().isoformat()
+ }
+ with self._file_write_lock:
+ with open(summary_file, 'w', encoding='utf-8') as f:
+ json.dump(save_summary, f, indent=2, ensure_ascii=False)
+
+ # Prepare log entry
+ # Compute safe input length and serialize error details safely
+ def _safe_content_len(c):
+ if isinstance(c, str):
+ return len(c)
+ if isinstance(c, list):
+ total = 0
+ for p in c:
+ if isinstance(p, dict):
+ if isinstance(p.get('text'), str):
+ total += len(p['text'])
+ elif isinstance(p.get('image_url'), dict):
+ url = p['image_url'].get('url')
+ if isinstance(url, str):
+ total += len(url)
+ elif isinstance(p, str):
+ total += len(p)
+ return total
+ return len(str(c)) if c is not None else 0
+
+ input_length_value = sum(_safe_content_len(msg.get('content')) for msg in (messages or []))
+
+ truncation_type_label = 'explicit' if finish_reason == 'length' else 'silent'
+
+ error_details_str = ''
+ if error_details is not None:
+ try:
+ error_details_str = json.dumps(error_details, ensure_ascii=False)
+ except Exception:
+ error_details_str = str(error_details)
+
+ log_entry = {
+ 'timestamp': datetime.now().isoformat(),
+ 'model': self.model,
+ 'provider': self.client_type,
+ 'context': context or 'unknown',
+ 'finish_reason': finish_reason,
+ 'attempts': attempts or 1,
+ 'input_length': input_length_value,
+ 'output_length': len(response_content) if response_content else 0,
+ 'truncation_type': truncation_type_label,
+ 'content_refused': 'yes' if finish_reason == 'content_filter' else 'no',
+ 'last_50_chars': response_content[-50:] if response_content else '',
+ 'error_details': error_details_str,
+ 'input_preview': self._get_safe_preview(messages),
+ 'output_preview': response_content[:200] if response_content else '',
+ 'output_filename': output_filename # Add output filename to log entry
+ }
+
+ # Write to CSV
+ with self._file_write_lock:
+ with open(csv_log_file, 'a', newline='', encoding='utf-8') as f:
+ fieldnames = [
+ 'timestamp', 'model', 'provider', 'context', 'finish_reason',
+ 'attempts', 'input_length', 'output_length', 'truncation_type',
+ 'content_refused', 'last_50_chars', 'error_details',
+ 'input_preview', 'output_preview', 'output_filename'
+ ]
+
+ writer = csv.DictWriter(f, fieldnames=fieldnames)
+
+ # Write header if new file
+ if not csv_file_exists:
+ writer.writeheader()
+
+ writer.writerow(log_entry)
+
+ # Write to TXT file with human-readable format
+ with self._file_write_lock:
+ with open(txt_log_file, 'a', encoding='utf-8') as f:
+ f.write(f"\n{'='*80}\n")
+ f.write(f"TRUNCATION LOG ENTRY - {log_entry['timestamp']}\n")
+ f.write(f"{'='*80}\n")
+ f.write(f"Output File: {log_entry['output_filename']}\n")
+ f.write(f"Model: {log_entry['model']}\n")
+ f.write(f"Provider: {log_entry['provider']}\n")
+ f.write(f"Context: {log_entry['context']}\n")
+ f.write(f"Finish Reason: {log_entry['finish_reason']}\n")
+ f.write(f"Attempts: {log_entry['attempts']}\n")
+ f.write(f"Input Length: {log_entry['input_length']} chars\n")
+ f.write(f"Output Length: {log_entry['output_length']} chars\n")
+ f.write(f"Truncation Type: {log_entry['truncation_type']}\n")
+ f.write(f"Content Refused: {log_entry['content_refused']}\n")
+
+ if log_entry['error_details']:
+ f.write(f"Error Details: {log_entry['error_details']}\n")
+
+ f.write(f"\n--- Input Preview ---\n")
+ f.write(f"{log_entry['input_preview']}\n")
+
+ f.write(f"\n--- Output Preview ---\n")
+ f.write(f"{log_entry['output_preview']}\n")
+
+ if log_entry['last_50_chars']:
+ f.write(f"\n--- Last 50 Characters ---\n")
+ f.write(f"{log_entry['last_50_chars']}\n")
+
+ f.write(f"\n{'='*80}\n")
+
+ # Write to HTML file with nice formatting
+ html_file_exists = os.path.exists(html_log_file)
+
+ # Create or update HTML file
+ if not html_file_exists:
+ # Create new HTML file with header
+ html_content = ('\n'
+ '\n'
+ '\n'
+ '\n'
+ 'Truncation Failures Log\n'
+ '\n'
+ '\n'
+ '\n'
+ 'Truncation Failures Log
\n'
+ '\n'
+ '\n'
+ '
\n'
+ '\n'
+ '\n'
+ '
\n')
+ # Write initial HTML structure
+ with self._file_write_lock:
+ with open(html_log_file, 'w', encoding='utf-8') as f:
+ f.write(html_content)
+ # Make sure HTML is properly closed
+ if not html_content.rstrip().endswith(''):
+ with self._file_write_lock:
+ with open(html_log_file, 'a', encoding='utf-8') as f:
+ f.write('\n\n')
+
+ # Read existing HTML content
+ with self._file_write_lock:
+ with open(html_log_file, 'r', encoding='utf-8') as f:
+ html_content = f.read()
+
+ # Generate summary HTML
+ summary_html = f"""
+
+
Summary
+
+
+
Total Truncations
+
{summary_data['total_truncations']}
+
+
+
Affected Files
+
{len(summary_data['truncated_files'])}
+
+
+
+
Truncated Output Files:
+
+ """
+
+ # Add file badges
+ for filename in sorted(summary_data['truncated_files']):
+ summary_html += f' {html.escape(filename)}\n'
+
+ summary_html += """
+
+
+ """
+
+ # Update summary in HTML
+ if '' in html_content:
+ # Replace existing summary between summary-container and entries-container
+ start_marker = '
'
+ end_marker = '
'
+ start = html_content.find(start_marker) + len(start_marker)
+ end = html_content.find(end_marker, start)
+ if end != -1:
+ html_content = html_content[:start] + '\n' + summary_html + '\n' + html_content[end:]
+ else:
+ # Fallback: insert before closing
+ tail_idx = html_content.rfind('')
+ if tail_idx != -1:
+ html_content = html_content[:start] + '\n' + summary_html + '\n' + html_content[tail_idx:]
+
+ # Generate new log entry HTML
+ truncation_class = 'truncation-type-silent' if log_entry['truncation_type'] == 'silent' else 'truncation-type-explicit'
+
+ entry_html = f"""
\n
{log_entry["timestamp"]} - Output: {html.escape(output_filename)}
\n
\n Model:{html.escape(str(log_entry["model"]))}\n Provider:{html.escape(str(log_entry["provider"]))}\n Context:{html.escape(str(log_entry["context"]))}\n Finish Reason:{html.escape(str(log_entry["finish_reason"]))}\n Attempts:{log_entry["attempts"]}\n Input Length:{log_entry["input_length"]:,} chars\n Output Length:{log_entry["output_length"]:,} chars\n Truncation Type:{html.escape(str(log_entry["truncation_type"]))}\n Content Refused:{html.escape(str(log_entry["content_refused"]))}\n """
+
+ if log_entry['error_details']:
+ entry_html += f' Error Details:{html.escape(str(log_entry["error_details"]))}\n'
+
+ entry_html += f"""
+
Input Preview
+
{html.escape(str(log_entry["input_preview"]))}
+
Output Preview
+
{html.escape(str(log_entry["output_preview"]))}
+ """
+
+ if log_entry['last_50_chars']:
+ entry_html += f"""
Last 50 Characters
+
{html.escape(str(log_entry["last_50_chars"]))}
+ """
+
+ entry_html += """
"""
+
+ # Insert new entry
+ if '
' in html_content:
+ insert_pos = html_content.find('
') + len('
')
+ # Find the next newline after the container div
+ newline_pos = html_content.find('\n', insert_pos)
+ if newline_pos != -1:
+ insert_pos = newline_pos + 1
+ html_content = html_content[:insert_pos] + entry_html + html_content[insert_pos:]
+ else:
+ # Fallback: append before closing body tag
+ insert_pos = html_content.rfind('')
+ html_content = html_content[:insert_pos] + entry_html + '\n' + html_content[insert_pos:]
+
+ # Write updated HTML
+ with self._file_write_lock:
+ with open(html_log_file, 'w', encoding='utf-8') as f:
+ f.write(html_content)
+
+ # Log to console with FULL PATH so user knows where to look
+ csv_log_path = os.path.abspath(csv_log_file)
+ txt_log_path = os.path.abspath(txt_log_file)
+ html_log_path = os.path.abspath(html_log_file)
+
+ if finish_reason == 'content_filter':
+ print(f"⛔ Content refused by {self.model}")
+ print(f" 📁 CSV log: {csv_log_path}")
+ print(f" 📁 TXT log: {txt_log_path}")
+ print(f" 📁 HTML log: {html_log_path}")
+ else:
+ print(f"✂️ Response truncated by {self.model}")
+ print(f" 📁 CSV log: {csv_log_path}")
+ print(f" 📁 TXT log: {txt_log_path}")
+ print(f" 📁 HTML log: {html_log_path}")
+
+ except Exception as e:
+ # Don't crash the translation just because logging failed
+ print(f"Failed to log truncation failure: {e}")
+
+ def _get_safe_preview(self, messages: List[Dict], max_length: int = 100) -> str:
+ """Get a safe preview of the input messages for logging"""
+ try:
+ # Get the last user message
+ for msg in reversed(messages):
+ if msg.get('role') == 'user':
+ content = msg.get('content', '')
+ if len(content) > max_length:
+ return content[:max_length] + "..."
+ return content
+ return "No user content found"
+ except:
+ return "Error extracting preview"
+
+ def _send_deepl(self, messages, temperature=None, max_tokens=None, response_name=None) -> UnifiedResponse:
+ """
+ Send messages to DeepL API for translation
+
+ Args:
+ messages: List of message dicts
+ temperature: Not used by DeepL (included for signature compatibility)
+ max_tokens: Not used by DeepL (included for signature compatibility)
+ response_name: Name for saving response (for debugging/logging)
+
+ Returns:
+ UnifiedResponse object
+ """
+
+ if not DEEPL_AVAILABLE:
+ raise UnifiedClientError("DeepL library not installed. Run: pip install deepl")
+
+ try:
+ # Get DeepL API key
+ deepl_api_key = os.getenv('DEEPL_API_KEY') or self.api_key
+
+ if not deepl_api_key or deepl_api_key == 'dummy':
+ raise UnifiedClientError("DeepL API key not found. Set DEEPL_API_KEY environment variable or configure in settings.")
+
+ # Initialize DeepL translator
+ translator = deepl.Translator(deepl_api_key)
+
+ # Extract ONLY user content to translate - ignore AI system prompts
+ text_to_translate = ""
+ source_lang = None
+ target_lang = "EN-US" # Default to US English
+
+ # Extract only user messages, ignore system prompts completely
+ for msg in messages:
+ if msg['role'] == 'user':
+ text_to_translate = msg['content']
+ # Simple language detection from content patterns
+ if any(ord(char) >= 0xAC00 and ord(char) <= 0xD7AF for char in text_to_translate[:100]):
+ source_lang = 'KO' # Korean
+ elif any(ord(char) >= 0x3040 and ord(char) <= 0x309F for char in text_to_translate[:100]) or \
+ any(ord(char) >= 0x30A0 and ord(char) <= 0x30FF for char in text_to_translate[:100]):
+ source_lang = 'JA' # Japanese
+ elif any(ord(char) >= 0x4E00 and ord(char) <= 0x9FFF for char in text_to_translate[:100]):
+ source_lang = 'ZH' # Chinese
+ break # Take only the first user message
+
+ if not text_to_translate:
+ raise UnifiedClientError("No text to translate found in messages")
+
+ # Log the translation request
+ logger.info(f"DeepL: Translating {len(text_to_translate)} characters")
+ if source_lang:
+ logger.info(f"DeepL: Source language: {source_lang}")
+
+ # Perform translation
+ start_time = time.time()
+
+ # DeepL API call
+ if source_lang:
+ result = translator.translate_text(
+ text_to_translate,
+ source_lang=source_lang,
+ target_lang=target_lang,
+ preserve_formatting=True,
+ tag_handling='html' if '<' in text_to_translate else None
+ )
+ else:
+ result = translator.translate_text(
+ text_to_translate,
+ target_lang=target_lang,
+ preserve_formatting=True,
+ tag_handling='html' if '<' in text_to_translate else None
+ )
+
+ elapsed_time = time.time() - start_time
+
+ # Get the translated text
+ translated_text = result.text
+
+ # Create UnifiedResponse object
+ response = UnifiedResponse(
+ content=translated_text,
+ finish_reason='complete',
+ usage={
+ 'characters': len(text_to_translate),
+ 'detected_source_lang': result.detected_source_lang if hasattr(result, 'detected_source_lang') else source_lang
+ },
+ raw_response={'result': result}
+ )
+
+ logger.info(f"DeepL: Translation completed in {elapsed_time:.2f}s")
+
+ return response
+
+ except Exception as e:
+ error_msg = f"DeepL API error: {str(e)}"
+ logger.error(f"ERROR: {error_msg}")
+ raise UnifiedClientError(error_msg)
+
+ def _send_google_translate(self, messages, temperature=None, max_tokens=None, response_name=None):
+ """Send messages to Google Translate API with markdown/HTML structure fixes"""
+
+ if not GOOGLE_TRANSLATE_AVAILABLE:
+ raise UnifiedClientError(
+ "Google Cloud Translate not installed. Run: pip install google-cloud-translate\n"
+ "Also ensure you have Google Cloud credentials configured."
+ )
+
+ # Import HTML output fixer for Google Translate's structured HTML
+ try:
+ from translate_output_fix import fix_google_translate_html
+ except ImportError:
+ # Fallback: create inline HTML structure fix
+ import re
+ def fix_google_translate_html(html_content):
+ """Simple fallback: fix HTML structure issues where everything is in one header tag"""
+ if not html_content:
+ return html_content
+
+ # Check if everything is wrapped in a single header tag
+ single_header = re.match(r'^<(h[1-6])>(.*?)\1>$', html_content.strip(), re.DOTALL)
+ if single_header:
+ tag = single_header.group(1)
+ content = single_header.group(2).strip()
+
+ # Simple pattern: "Number. Title Name was/were..." -> "Number. Title" + "Name was/were..."
+ chapter_match = re.match(r'^(\d+\.\s+[^A-Z]*[A-Z][^A-Z]*?)\s+([A-Z][a-z]+\s+(?:was|were|had|did|is|are)\s+.*)$', content, re.DOTALL)
+ if chapter_match:
+ title = chapter_match.group(1).strip()
+ body = chapter_match.group(2).strip()
+ # Create properly structured HTML
+ paragraphs = re.split(r'\n\s*\n', body)
+ formatted_paragraphs = [f'
{p.strip()}
' for p in paragraphs if p.strip()]
+ return f'<{tag}>{title}{tag}>\n\n' + '\n\n'.join(formatted_paragraphs)
+
+ return html_content
+
+ try:
+ # Check for Google Cloud credentials with better error messages
+ google_creds_path = None
+
+ # Try multiple possible locations for credentials
+ possible_paths = [
+ os.getenv('GOOGLE_APPLICATION_CREDENTIALS'),
+ os.getenv('GOOGLE_CLOUD_CREDENTIALS'),
+ self.config.get('google_cloud_credentials') if hasattr(self, 'config') else None,
+ self.config.get('google_vision_credentials') if hasattr(self, 'config') else None,
+ ]
+
+ for path in possible_paths:
+ if path and os.path.exists(path):
+ google_creds_path = path
+ break
+
+ if not google_creds_path:
+ raise UnifiedClientError(
+ "Google Cloud credentials not found.\n\n"
+ "To use Google Translate, you need to:\n"
+ "1. Create a Google Cloud service account\n"
+ "2. Download the JSON credentials file\n"
+ "3. Set it up in Glossarion:\n"
+ " - For GUI: Use the 'Set up Google Cloud Translate Credentials' button\n"
+ " - For CLI: Set GOOGLE_APPLICATION_CREDENTIALS environment variable\n\n"
+ "The same credentials work for both Google Translate and Cloud Vision (manga OCR)."
+ )
+
+ # Set the environment variable for the Google client library
+ os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = google_creds_path
+ logger.info(f"Using Google Cloud credentials: {os.path.basename(google_creds_path)}")
+
+ # Initialize the client
+ translate_client = google_translate.Client()
+
+ # Extract ONLY user content to translate - ignore AI system prompts
+ text_to_translate = ""
+ source_lang = None
+ target_lang = 'en' # Default to English
+
+ # Extract only user messages, ignore system prompts completely
+ for msg in messages:
+ if msg['role'] == 'user':
+ text_to_translate = msg['content']
+ # Simple language detection from content patterns
+ if any(ord(char) >= 0xAC00 and ord(char) <= 0xD7AF for char in text_to_translate[:100]):
+ source_lang = 'ko' # Korean
+ elif any(ord(char) >= 0x3040 and ord(char) <= 0x309F for char in text_to_translate[:100]) or \
+ any(ord(char) >= 0x30A0 and ord(char) <= 0x30FF for char in text_to_translate[:100]):
+ source_lang = 'ja' # Japanese
+ elif any(ord(char) >= 0x4E00 and ord(char) <= 0x9FFF for char in text_to_translate[:100]):
+ source_lang = 'zh' # Chinese
+ break # Take only the first user message
+
+ if not text_to_translate:
+ # Return empty response instead of error
+ return UnifiedResponse(
+ content="",
+ finish_reason='complete',
+ usage={'characters': 0},
+ raw_response={}
+ )
+
+ # Log the translation request
+ logger.info(f"Google Translate: Translating {len(text_to_translate)} characters")
+ if source_lang:
+ logger.info(f"Google Translate: Source language: {source_lang}")
+
+ # Perform translation
+ start_time = time.time()
+
+ # Google Translate API call - force text format for markdown content
+ # Detect if this is markdown from html2text (starts with #)
+ is_markdown = text_to_translate.strip().startswith('#')
+ translate_format = 'text' if is_markdown else ('html' if '<' in text_to_translate else 'text')
+
+
+ if source_lang:
+ result = translate_client.translate(
+ text_to_translate,
+ source_language=source_lang,
+ target_language=target_lang,
+ format_=translate_format
+ )
+ else:
+ # Auto-detect source language
+ result = translate_client.translate(
+ text_to_translate,
+ target_language=target_lang,
+ format_=translate_format
+ )
+
+ elapsed_time = time.time() - start_time
+
+ # Handle both single result and list of results
+ if isinstance(result, list):
+ result = result[0] if result else {}
+
+ translated_text = result.get('translatedText', '')
+ detected_lang = result.get('detectedSourceLanguage', source_lang)
+
+ # FIX: Convert literal \n characters to actual line breaks
+ if '\\n' in translated_text:
+ translated_text = translated_text.replace('\\n', '\n')
+ logger.debug("Converted literal \\n characters to actual line breaks")
+
+ # Also handle other escaped characters that might appear
+ if '\\r' in translated_text:
+ translated_text = translated_text.replace('\\r', '\r')
+ if '\\t' in translated_text:
+ translated_text = translated_text.replace('\\t', '\t')
+
+ import re
+
+ # Fix markdown structure issues in Google Translate text output
+ original_text = translated_text
+
+ if is_markdown and translate_format == 'text':
+ # Google Translate in text mode removes line breaks from markdown
+ # Need to restore proper markdown structure
+
+ # Pattern: "#6. Title Content goes here" -> "#6. Title\n\nContent goes here"
+ markdown_fix = re.match(r'^(#{1,6}[^\n]*?)([A-Z][^#]+)$', translated_text.strip(), re.DOTALL)
+ if markdown_fix:
+ header_part = markdown_fix.group(1).strip()
+ content_part = markdown_fix.group(2).strip()
+
+ # Try to split header from content intelligently
+ # Look for patterns like "6. Title Name was" -> "6. Title" + "Name was"
+ title_content_match = re.match(r'^(.*?)([A-Z][a-z]+\s+(?:was|were|had|did|is|are)\s+.*)$', content_part, re.DOTALL)
+ if title_content_match:
+ title_end = title_content_match.group(1).strip()
+ content_start = title_content_match.group(2).strip()
+
+ # Restore paragraph structure in the content
+ paragraphs = re.split(r'(?<=[.!?])\s+(?=[A-Z])', content_start)
+ formatted_content = '\n\n'.join(paragraphs)
+
+ translated_text = f"{header_part} {title_end}\n\n{formatted_content}"
+ else:
+ # Fallback: try to split at reasonable word boundary
+ words = content_part.split()
+ if len(words) > 3:
+ for i in range(2, min(6, len(words)-2)):
+ if words[i][0].isupper():
+ title_words = ' '.join(words[:i])
+ content_words = ' '.join(words[i:])
+
+ # Restore paragraph structure in the content
+ paragraphs = re.split(r'(?<=[.!?])\s+(?=[A-Z])', content_words)
+ formatted_content = '\n\n'.join(paragraphs)
+
+ translated_text = f"{header_part} {title_words}\n\n{formatted_content}"
+ break
+
+ if translate_format == 'html':
+ # Apply HTML structure fixes for HTML mode
+ translated_text = fix_google_translate_html(translated_text)
+
+
+ # Create UnifiedResponse object
+ response = UnifiedResponse(
+ content=translated_text,
+ finish_reason='complete',
+ usage={
+ 'characters': len(text_to_translate),
+ 'detected_source_lang': detected_lang
+ },
+ raw_response=result
+ )
+
+ logger.info(f"Google Translate: Translation completed in {elapsed_time:.2f}s")
+
+ return response
+
+ except UnifiedClientError:
+ # Re-raise our custom errors with helpful messages
+ raise
+ except Exception as e:
+ # Provide more helpful error messages for common issues
+ error_msg = str(e)
+
+ if "403" in error_msg or "permission" in error_msg.lower():
+ raise UnifiedClientError(
+ "Google Translate API permission denied.\n\n"
+ "Please ensure:\n"
+ "1. Cloud Translation API is enabled in your Google Cloud project\n"
+ "2. Your service account has the 'Cloud Translation API User' role\n"
+ "3. Billing is enabled for your project (required for Translation API)\n\n"
+ f"Original error: {error_msg}"
+ )
+ elif "billing" in error_msg.lower():
+ raise UnifiedClientError(
+ "Google Cloud billing not enabled.\n\n"
+ "The Translation API requires billing to be enabled on your project.\n"
+ "Visit: https://console.cloud.google.com/billing\n\n"
+ f"Original error: {error_msg}"
+ )
+ else:
+ raise UnifiedClientError(f"Google Translate API error: {error_msg}")
diff --git a/update_manager.py b/update_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..89115c10e0626efdfc01d2dac2dfb4c4c47c8ae9
--- /dev/null
+++ b/update_manager.py
@@ -0,0 +1,826 @@
+# update_manager.py - Auto-update functionality for Glossarion
+import os
+import sys
+import json
+import requests
+import threading
+import concurrent.futures
+import time
+import re
+from typing import Optional, Dict, Tuple, List
+from packaging import version
+import tkinter as tk
+from tkinter import ttk, messagebox, font
+import ttkbootstrap as tb
+from datetime import datetime
+
+class UpdateManager:
+ """Handles automatic update checking and installation for Glossarion"""
+
+ GITHUB_API_URL = "https://api.github.com/repos/Shirochi-stack/Glossarion/releases"
+ GITHUB_LATEST_URL = "https://api.github.com/repos/Shirochi-stack/Glossarion/releases/latest"
+
+ def __init__(self, main_gui, base_dir):
+ self.main_gui = main_gui
+ self.base_dir = base_dir
+ self.update_available = False
+ # Use shared executor from main GUI if available
+ try:
+ if hasattr(self.main_gui, '_ensure_executor'):
+ self.main_gui._ensure_executor()
+ self.executor = getattr(self.main_gui, 'executor', None)
+ except Exception:
+ self.executor = None
+ self.latest_release = None
+ self.all_releases = [] # Store all fetched releases
+ self.download_progress = 0
+ self.is_downloading = False
+ # Load persistent check time from config
+ self._last_check_time = self.main_gui.config.get('last_update_check_time', 0)
+ self._check_cache_duration = 1800 # Cache for 30 minutes
+ self.selected_asset = None # Store selected asset for download
+
+ # Get version from the main GUI's __version__ variable
+ if hasattr(main_gui, '__version__'):
+ self.CURRENT_VERSION = main_gui.__version__
+ else:
+ # Extract from window title as fallback
+ title = self.main_gui.master.title()
+ if 'v' in title:
+ self.CURRENT_VERSION = title.split('v')[-1].strip()
+ else:
+ self.CURRENT_VERSION = "0.0.0"
+
+ def fetch_multiple_releases(self, count=10) -> List[Dict]:
+ """Fetch multiple releases from GitHub
+
+ Args:
+ count: Number of releases to fetch
+
+ Returns:
+ List of release data dictionaries
+ """
+ try:
+ headers = {
+ 'Accept': 'application/vnd.github.v3+json',
+ 'User-Agent': 'Glossarion-Updater'
+ }
+
+ # Fetch multiple releases with retry logic
+ max_retries = 2
+ timeout = 10 # Reduced timeout
+
+ for attempt in range(max_retries + 1):
+ try:
+ response = requests.get(
+ f"{self.GITHUB_API_URL}?per_page={count}",
+ headers=headers,
+ timeout=timeout
+ )
+ response.raise_for_status()
+ break # Success
+ except (requests.Timeout, requests.ConnectionError) as e:
+ if attempt == max_retries:
+ raise # Re-raise after final attempt
+ time.sleep(1)
+
+ releases = response.json()
+
+ # Process each release's notes
+ for release in releases:
+ if 'body' in release and release['body']:
+ # Clean up but don't truncate for history viewing
+ body = release['body']
+ # Just clean up excessive newlines
+ body = re.sub(r'\n{3,}', '\n\n', body)
+ release['body'] = body
+
+ return releases
+
+ except Exception as e:
+ print(f"Error fetching releases: {e}")
+ return []
+
+ def check_for_updates_async(self, silent=True, force_show=False):
+ """Run check_for_updates in the background using the shared executor.
+ Returns a Future if an executor is available, else runs in a thread.
+ """
+ try:
+ # Ensure shared executor
+ if hasattr(self.main_gui, '_ensure_executor'):
+ self.main_gui._ensure_executor()
+ execu = getattr(self, 'executor', None) or getattr(self.main_gui, 'executor', None)
+ if execu:
+ future = execu.submit(self.check_for_updates, silent, force_show)
+ return future
+ except Exception:
+ pass
+
+ # Fallback to thread if executor not available
+ def _worker():
+ try:
+ self.check_for_updates(silent=silent, force_show=force_show)
+ except Exception:
+ pass
+ t = threading.Thread(target=_worker, daemon=True)
+ t.start()
+ return None
+
+ def check_for_updates(self, silent=True, force_show=False) -> Tuple[bool, Optional[Dict]]:
+ """Check GitHub for newer releases
+
+ Args:
+ silent: If True, don't show error messages
+ force_show: If True, show the dialog even when up to date
+
+ Returns:
+ Tuple of (update_available, release_info)
+ """
+ try:
+ # Check if we need to skip the check due to cache
+ current_time = time.time()
+ if not force_show and (current_time - self._last_check_time) < self._check_cache_duration:
+ print(f"[DEBUG] Skipping update check - cache still valid for {int(self._check_cache_duration - (current_time - self._last_check_time))} seconds")
+ return False, None
+
+ # Check if this version was previously skipped
+ skipped_versions = self.main_gui.config.get('skipped_versions', [])
+
+ headers = {
+ 'Accept': 'application/vnd.github.v3+json',
+ 'User-Agent': 'Glossarion-Updater'
+ }
+
+ # Try with shorter timeout and retry logic
+ max_retries = 2
+ timeout = 10 # Reduced from 30 seconds
+
+ for attempt in range(max_retries + 1):
+ try:
+ print(f"[DEBUG] Update check attempt {attempt + 1}/{max_retries + 1}")
+ response = requests.get(self.GITHUB_LATEST_URL, headers=headers, timeout=timeout)
+ response.raise_for_status()
+ break # Success, exit retry loop
+ except (requests.Timeout, requests.ConnectionError) as e:
+ if attempt == max_retries:
+ # Last attempt failed, save check time and re-raise
+ self._save_last_check_time()
+ raise
+ print(f"[DEBUG] Network error on attempt {attempt + 1}: {e}")
+ time.sleep(1) # Short delay before retry
+
+ release_data = response.json()
+ latest_version = release_data['tag_name'].lstrip('v')
+
+ # Save successful check time
+ self._save_last_check_time()
+
+ # Fetch all releases for history regardless
+ self.all_releases = self.fetch_multiple_releases(count=10)
+ self.latest_release = release_data
+
+ # Check if this version was skipped by user
+ if release_data['tag_name'] in skipped_versions and not force_show:
+ return False, None
+
+ # Compare versions
+ if version.parse(latest_version) > version.parse(self.CURRENT_VERSION):
+ self.update_available = True
+
+ # Show update dialog when update is available
+ print(f"[DEBUG] Showing update dialog for version {latest_version}")
+ self.main_gui.master.after(100, self.show_update_dialog)
+
+ return True, release_data
+ else:
+ # We're up to date
+ self.update_available = False
+
+ # Show dialog if explicitly requested (from menu)
+ if force_show or not silent:
+ self.main_gui.master.after(100, self.show_update_dialog)
+
+ return False, None
+
+ except requests.Timeout:
+ if not silent:
+ messagebox.showerror("Update Check Failed",
+ "Connection timed out while checking for updates.\n\n"
+ "This is usually due to network connectivity issues.\n"
+ "The next update check will be in 1 hour.")
+ return False, None
+
+ except requests.ConnectionError as e:
+ if not silent:
+ if 'api.github.com' in str(e):
+ messagebox.showerror("Update Check Failed",
+ "Cannot reach GitHub servers for update check.\n\n"
+ "This may be due to:\n"
+ "• Internet connectivity issues\n"
+ "• Firewall blocking GitHub API\n"
+ "• GitHub API temporarily unavailable\n\n"
+ "The next update check will be in 1 hour.")
+ else:
+ messagebox.showerror("Update Check Failed",
+ f"Network error: {str(e)}\n\n"
+ "The next update check will be in 1 hour.")
+ return False, None
+
+ except requests.HTTPError as e:
+ if not silent:
+ if e.response.status_code == 403:
+ messagebox.showerror("Update Check Failed",
+ "GitHub API rate limit exceeded. Please try again later.")
+ else:
+ messagebox.showerror("Update Check Failed",
+ f"GitHub returned error: {e.response.status_code}")
+ return False, None
+
+ except ValueError as e:
+ if not silent:
+ messagebox.showerror("Update Check Failed",
+ "Invalid response from GitHub. The update service may be temporarily unavailable.")
+ return False, None
+
+ except Exception as e:
+ if not silent:
+ messagebox.showerror("Update Check Failed",
+ f"An unexpected error occurred:\n{str(e)}")
+ return False, None
+
+ def check_for_updates_manual(self):
+ """Manual update check from menu - always shows dialog (async)"""
+ return self.check_for_updates_async(silent=False, force_show=True)
+
+ def _save_last_check_time(self):
+ """Save the last update check time to config"""
+ try:
+ current_time = time.time()
+ self._last_check_time = current_time
+ self.main_gui.config['last_update_check_time'] = current_time
+ # Save config without showing message
+ self.main_gui.save_config(show_message=False)
+ except Exception as e:
+ print(f"[DEBUG] Failed to save last check time: {e}")
+
+ def format_markdown_to_tkinter(self, text_widget, markdown_text):
+ """Convert GitHub markdown to formatted tkinter text - simplified version
+
+ Args:
+ text_widget: The Text widget to insert formatted text into
+ markdown_text: The markdown source text
+ """
+ # Configure minimal tags
+ text_widget.tag_config("heading", font=('TkDefaultFont', 12, 'bold'))
+ text_widget.tag_config("bold", font=('TkDefaultFont', 10, 'bold'))
+
+ # Process text line by line with minimal formatting
+ lines = markdown_text.split('\n')
+
+ for line in lines:
+ # Strip any weird unicode characters that might cause display issues
+ line = ''.join(char for char in line if ord(char) < 65536)
+
+ # Handle headings
+ if line.startswith('#'):
+ # Remove all # symbols and get the heading text
+ heading_text = line.lstrip('#').strip()
+ if heading_text:
+ text_widget.insert('end', heading_text + '\n', 'heading')
+
+ # Handle bullet points
+ elif line.strip().startswith(('- ', '* ')):
+ # Get the text after the bullet
+ bullet_text = line.strip()[2:].strip()
+ # Clean the text of markdown formatting
+ bullet_text = self._clean_markdown_text(bullet_text)
+ text_widget.insert('end', ' • ' + bullet_text + '\n')
+
+ # Handle numbered lists
+ elif re.match(r'^\s*\d+\.\s', line):
+ # Extract number and text
+ match = re.match(r'^(\s*)(\d+)\.\s(.+)', line)
+ if match:
+ indent, num, text = match.groups()
+ clean_text = self._clean_markdown_text(text.strip())
+ text_widget.insert('end', f' {num}. {clean_text}\n')
+
+ # Handle separator lines
+ elif line.strip() in ['---', '***', '___']:
+ text_widget.insert('end', '─' * 40 + '\n')
+
+ # Handle code blocks - just skip the markers
+ elif line.strip().startswith('```'):
+ continue # Skip code fence markers
+
+ # Regular text
+ elif line.strip():
+ # Clean and insert the line
+ clean_text = self._clean_markdown_text(line)
+ # Check if this looks like it should be bold (common pattern)
+ if clean_text.endswith(':') and len(clean_text) < 50:
+ text_widget.insert('end', clean_text + '\n', 'bold')
+ else:
+ text_widget.insert('end', clean_text + '\n')
+
+ # Empty lines
+ else:
+ text_widget.insert('end', '\n')
+
+ def _clean_markdown_text(self, text):
+ """Remove markdown formatting from text
+
+ Args:
+ text: Text with markdown formatting
+
+ Returns:
+ Clean text without markdown symbols
+ """
+ # Remove inline code backticks
+ text = re.sub(r'`([^`]+)`', r'\1', text)
+
+ # Remove bold markers
+ text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
+ text = re.sub(r'__([^_]+)__', r'\1', text)
+
+ # Remove italic markers
+ text = re.sub(r'\*([^*]+)\*', r'\1', text)
+ text = re.sub(r'_([^_]+)_', r'\1', text)
+
+ # Remove links but keep link text
+ text = re.sub(r'\[([^\]]+)\]\([^)]+\)', r'\1', text)
+
+ # Remove any remaining special characters that might cause issues
+ text = text.replace('\u200b', '') # Remove zero-width spaces
+ text = text.replace('\ufeff', '') # Remove BOM
+
+ return text.strip()
+
+ def show_update_dialog(self):
+ """Show update dialog (for updates or version history)"""
+ if not self.latest_release and not self.all_releases:
+ # Try to fetch releases if we don't have them
+ self.all_releases = self.fetch_multiple_releases(count=10)
+ if self.all_releases:
+ self.latest_release = self.all_releases[0]
+ else:
+ messagebox.showerror("Error", "Unable to fetch version information from GitHub.")
+ return
+
+ # Set appropriate title
+ if self.update_available:
+ title = "Update Available"
+ else:
+ title = "Version History"
+
+ # Create dialog first without content
+ dialog, scrollable_frame, canvas = self.main_gui.wm.setup_scrollable(
+ self.main_gui.master,
+ title,
+ width=None,
+ height=None,
+ max_width_ratio=0.5,
+ max_height_ratio=0.8
+ )
+
+ # Show dialog immediately
+ dialog.update_idletasks()
+
+ # Then populate content
+ self.main_gui.master.after(10, lambda: self._populate_update_dialog(dialog, scrollable_frame, canvas))
+
+ def _populate_update_dialog(self, dialog, scrollable_frame, canvas):
+ """Populate the update dialog content"""
+ # Main container
+ main_frame = ttk.Frame(scrollable_frame)
+ main_frame.pack(fill='both', expand=True, padx=20, pady=20)
+
+ # Initialize selected_asset to None
+ self.selected_asset = None
+
+ # Version info
+ version_frame = ttk.LabelFrame(main_frame, text="Version Information", padding=10)
+ version_frame.pack(fill='x', pady=(0, 10))
+
+ ttk.Label(version_frame,
+ text=f"Current Version: {self.CURRENT_VERSION}").pack(anchor='w')
+
+ if self.latest_release:
+ latest_version = self.latest_release['tag_name']
+ if self.update_available:
+ ttk.Label(version_frame,
+ text=f"Latest Version: {latest_version}",
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor='w')
+ else:
+ ttk.Label(version_frame,
+ text=f"Latest Version: {latest_version} ✓ You are up to date!",
+ foreground='green',
+ font=('TkDefaultFont', 10, 'bold')).pack(anchor='w')
+
+ # ALWAYS show asset selection when we have the first release data (current or latest)
+ release_to_check = self.all_releases[0] if self.all_releases else self.latest_release
+
+ if release_to_check:
+ # Get exe files from the first/latest release
+ exe_assets = [a for a in release_to_check.get('assets', [])
+ if a['name'].lower().endswith('.exe')]
+
+ print(f"[DEBUG] Found {len(exe_assets)} exe files in release {release_to_check.get('tag_name')}")
+
+ # Show selection UI if there are exe files
+ if exe_assets:
+ # Determine the title based on whether there are multiple variants
+ if len(exe_assets) > 1:
+ frame_title = "Select Version to Download"
+ else:
+ frame_title = "Available Download"
+
+ asset_frame = ttk.LabelFrame(main_frame, text=frame_title, padding=10)
+ asset_frame.pack(fill='x', pady=(0, 10))
+
+ if len(exe_assets) > 1:
+ # Multiple exe files - show radio buttons to choose
+ self.asset_var = tk.StringVar()
+ for i, asset in enumerate(exe_assets):
+ filename = asset['name']
+ size_mb = asset['size'] / (1024 * 1024)
+
+ # Try to identify variant type from filename
+ if 'full' in filename.lower():
+ variant_label = f"Full Version - {filename} ({size_mb:.1f} MB)"
+ else:
+ variant_label = f"Standard Version - {filename} ({size_mb:.1f} MB)"
+
+ rb = ttk.Radiobutton(asset_frame, text=variant_label,
+ variable=self.asset_var,
+ value=str(i))
+ rb.pack(anchor='w', pady=2)
+
+ # Select first option by default
+ if i == 0:
+ self.asset_var.set(str(i))
+ self.selected_asset = asset
+
+ # Add listener for selection changes
+ def on_asset_change(*args):
+ idx = int(self.asset_var.get())
+ self.selected_asset = exe_assets[idx]
+
+ self.asset_var.trace_add('write', on_asset_change)
+ else:
+ # Only one exe file - just show it and set it as selected
+ self.selected_asset = exe_assets[0]
+ filename = exe_assets[0]['name']
+ size_mb = exe_assets[0]['size'] / (1024 * 1024)
+ ttk.Label(asset_frame,
+ text=f"{filename} ({size_mb:.1f} MB)").pack(anchor='w')
+
+ # Create notebook for version history
+ notebook = ttk.Notebook(main_frame)
+ notebook.pack(fill='both', expand=True, pady=(0, 10))
+
+ # Add tabs for different versions
+ if self.all_releases:
+ for i, release in enumerate(self.all_releases[:5]): # Show up to 5 versions
+ version_tag = release['tag_name']
+ version_num = version_tag.lstrip('v')
+ is_current = version_num == self.CURRENT_VERSION
+ is_latest = i == 0
+
+ # Create tab label
+ tab_label = version_tag
+ if is_current and is_latest:
+ tab_label += " (Current)"
+ elif is_current:
+ tab_label += " (Current)"
+ elif is_latest:
+ tab_label += " (Latest)"
+
+ # Create frame for this version
+ tab_frame = ttk.Frame(notebook)
+ notebook.add(tab_frame, text=tab_label)
+
+ # Add release date
+ if 'published_at' in release:
+ date_str = release['published_at'][:10] # Get YYYY-MM-DD
+ date_label = ttk.Label(tab_frame, text=f"Released: {date_str}",
+ font=('TkDefaultFont', 9, 'italic'))
+ date_label.pack(anchor='w', padx=10, pady=(10, 5))
+
+ # Create text widget for release notes
+ text_frame = ttk.Frame(tab_frame)
+ text_frame.pack(fill='both', expand=True, padx=10, pady=(0, 10))
+
+ notes_text = tk.Text(text_frame, height=12, wrap='word', width=60)
+ notes_scroll = ttk.Scrollbar(text_frame, command=notes_text.yview)
+ notes_text.config(yscrollcommand=notes_scroll.set)
+
+ notes_text.pack(side='left', fill='both', expand=True)
+ notes_scroll.pack(side='right', fill='y')
+
+ # Format and insert release notes with markdown support
+ release_notes = release.get('body', 'No release notes available')
+ self.format_markdown_to_tkinter(notes_text, release_notes)
+
+ notes_text.config(state='disabled') # Make read-only
+
+ # Don't set background color as it causes rendering artifacts
+ else:
+ # Fallback to simple display if no releases fetched
+ notes_frame = ttk.LabelFrame(main_frame, text="Release Notes", padding=10)
+ notes_frame.pack(fill='both', expand=True, pady=(0, 10))
+
+ notes_text = tk.Text(notes_frame, height=10, wrap='word')
+ notes_scroll = ttk.Scrollbar(notes_frame, command=notes_text.yview)
+ notes_text.config(yscrollcommand=notes_scroll.set)
+
+ notes_text.pack(side='left', fill='both', expand=True)
+ notes_scroll.pack(side='right', fill='y')
+
+ if self.latest_release:
+ release_notes = self.latest_release.get('body', 'No release notes available')
+ self.format_markdown_to_tkinter(notes_text, release_notes)
+ else:
+ notes_text.insert('1.0', 'Unable to fetch release notes.')
+
+ notes_text.config(state='disabled')
+
+ # Download progress (initially hidden)
+ self.progress_frame = ttk.Frame(main_frame)
+ self.progress_label = ttk.Label(self.progress_frame, text="Downloading update...")
+ self.progress_label.pack(anchor='w')
+ self.progress_bar = ttk.Progressbar(self.progress_frame, mode='determinate', length=400)
+ self.progress_bar.pack(fill='x', pady=5)
+
+ # Add status label for download details
+ self.status_label = ttk.Label(self.progress_frame, text="", font=('TkDefaultFont', 8))
+ self.status_label.pack(anchor='w')
+
+ # Buttons
+ button_frame = ttk.Frame(main_frame)
+ button_frame.pack(fill='x', pady=(10, 0))
+
+ def start_download():
+ if not self.selected_asset:
+ messagebox.showerror("No File Selected",
+ "Please select a version to download.")
+ return
+
+ self.progress_frame.pack(fill='x', pady=(0, 10), before=button_frame)
+ download_btn.config(state='disabled')
+ if 'remind_btn' in locals():
+ remind_btn.config(state='disabled')
+ if 'skip_btn' in locals():
+ skip_btn.config(state='disabled')
+ if 'close_btn' in locals():
+ close_btn.config(state='disabled')
+
+ # Reset progress
+ self.progress_bar['value'] = 0
+ self.download_progress = 0
+
+ # Start download using shared executor if available
+ try:
+ if hasattr(self.main_gui, '_ensure_executor'):
+ self.main_gui._ensure_executor()
+ execu = getattr(self, 'executor', None) or getattr(self.main_gui, 'executor', None)
+ if execu:
+ execu.submit(self.download_update, dialog)
+ else:
+ thread = threading.Thread(target=self.download_update, args=(dialog,), daemon=True)
+ thread.start()
+ except Exception:
+ thread = threading.Thread(target=self.download_update, args=(dialog,), daemon=True)
+ thread.start()
+
+ # Always show download button if we have exe files
+ has_exe_files = self.selected_asset is not None
+
+ if self.update_available:
+ # Show update-specific buttons
+ download_btn = tb.Button(button_frame, text="Download Update",
+ command=start_download, bootstyle="success")
+ download_btn.pack(side='left', padx=(0, 5))
+
+ remind_btn = tb.Button(button_frame, text="Remind Me Later",
+ command=dialog.destroy, bootstyle="secondary")
+ remind_btn.pack(side='left', padx=5)
+
+ skip_btn = tb.Button(button_frame, text="Skip This Version",
+ command=lambda: self.skip_version(dialog),
+ bootstyle="link")
+ skip_btn.pack(side='left', padx=5)
+ elif has_exe_files:
+ # We're up to date but have downloadable files
+ # Check if there are multiple exe files
+ release_to_check = self.all_releases[0] if self.all_releases else self.latest_release
+ exe_count = 0
+ if release_to_check:
+ exe_count = len([a for a in release_to_check.get('assets', [])
+ if a['name'].lower().endswith('.exe')])
+
+ if exe_count > 1:
+ # Multiple versions available
+ download_btn = tb.Button(button_frame, text="Download Different Path",
+ command=start_download, bootstyle="info")
+ else:
+ # Single version available
+ download_btn = tb.Button(button_frame, text="Re-download",
+ command=start_download, bootstyle="secondary")
+ download_btn.pack(side='left', padx=(0, 5))
+
+ close_btn = tb.Button(button_frame, text="Close",
+ command=dialog.destroy,
+ bootstyle="secondary")
+ close_btn.pack(side='left', padx=(0, 5))
+ else:
+ # No downloadable files
+ close_btn = tb.Button(button_frame, text="Close",
+ command=dialog.destroy,
+ bootstyle="primary")
+ close_btn.pack(side='left', padx=(0, 5))
+
+ # Add "View All Releases" link button
+ def open_releases_page():
+ import webbrowser
+ webbrowser.open("https://github.com/Shirochi-stack/Glossarion/releases")
+
+ tb.Button(button_frame, text="View All Releases",
+ command=open_releases_page,
+ bootstyle="link").pack(side='right', padx=5)
+
+ # Auto-resize at the end
+ dialog.after(100, lambda: self.main_gui.wm.auto_resize_dialog(dialog, canvas, max_width_ratio=0.5, max_height_ratio=0.8))
+
+ # Handle window close
+ dialog.protocol("WM_DELETE_WINDOW", lambda: [dialog._cleanup_scrolling(), dialog.destroy()])
+
+ def skip_version(self, dialog):
+ """Mark this version as skipped and close dialog"""
+ if not self.latest_release:
+ dialog.destroy()
+ return
+
+ # Get current skipped versions list
+ if 'skipped_versions' not in self.main_gui.config:
+ self.main_gui.config['skipped_versions'] = []
+
+ # Add this version to skipped list
+ version_tag = self.latest_release['tag_name']
+ if version_tag not in self.main_gui.config['skipped_versions']:
+ self.main_gui.config['skipped_versions'].append(version_tag)
+
+ # Save config
+ self.main_gui.save_config(show_message=False)
+
+ # Close dialog
+ dialog.destroy()
+
+ # Show confirmation
+ messagebox.showinfo("Version Skipped",
+ f"Version {version_tag} will be skipped in future update checks.\n"
+ "You can manually check for updates from the Help menu.")
+
+ def download_update(self, dialog):
+ """Download the update file"""
+ try:
+ # Use the selected asset
+ asset = self.selected_asset
+
+ if not asset:
+ dialog.after(0, lambda: messagebox.showerror("Download Error",
+ "No file selected for download."))
+ return
+
+ # Get the current executable path
+ if getattr(sys, 'frozen', False):
+ # Running as compiled executable
+ current_exe = sys.executable
+ download_dir = os.path.dirname(current_exe)
+ else:
+ # Running as script
+ current_exe = None
+ download_dir = self.base_dir
+
+ # Use the exact filename from GitHub
+ original_filename = asset['name'] # e.g., "Glossarion v3.1.3.exe"
+ new_exe_path = os.path.join(download_dir, original_filename)
+
+ # If new file would overwrite current executable, download to temp name first
+ if current_exe and os.path.normpath(new_exe_path) == os.path.normpath(current_exe):
+ temp_path = new_exe_path + ".new"
+ download_path = temp_path
+ else:
+ download_path = new_exe_path
+
+ # Download with progress tracking and shorter timeout
+ response = requests.get(asset['browser_download_url'], stream=True, timeout=15)
+ total_size = int(response.headers.get('content-length', 0))
+
+ downloaded = 0
+ chunk_size = 8192
+
+ with open(download_path, 'wb') as f:
+ for chunk in response.iter_content(chunk_size=chunk_size):
+ if chunk:
+ f.write(chunk)
+ downloaded += len(chunk)
+
+ # Update progress bar
+ if total_size > 0:
+ progress = int((downloaded / total_size) * 100)
+ size_mb = downloaded / (1024 * 1024)
+ total_mb = total_size / (1024 * 1024)
+
+ # Use after_idle for smoother updates
+ def update_progress(p=progress, d=size_mb, t=total_mb):
+ try:
+ self.progress_bar['value'] = p
+ self.progress_label.config(text=f"Downloading update... {p}%")
+ self.status_label.config(text=f"{d:.1f} MB / {t:.1f} MB")
+ except:
+ pass # Dialog might have been closed
+
+ dialog.after_idle(update_progress)
+
+ # Download complete
+ dialog.after(0, lambda: self.download_complete(dialog, download_path))
+
+ except Exception as e:
+ # Capture the error message immediately
+ error_msg = str(e)
+ dialog.after(0, lambda: messagebox.showerror("Download Failed", error_msg))
+
+ def download_complete(self, dialog, file_path):
+ """Handle completed download"""
+ dialog.destroy()
+
+ result = messagebox.askyesno(
+ "Download Complete",
+ "Update downloaded successfully.\n\n"
+ "Would you like to install it now?\n"
+ "(The application will need to restart)"
+ )
+
+ if result:
+ self.install_update(file_path)
+
+ def install_update(self, update_file):
+ """Launch the update installer and exit current app"""
+ try:
+ # Save current state/config if needed
+ self.main_gui.save_config(show_message=False)
+
+ # Get current executable path
+ if getattr(sys, 'frozen', False):
+ current_exe = sys.executable
+ current_dir = os.path.dirname(current_exe)
+
+ # Create a batch file to handle the update
+ batch_content = f"""@echo off
+echo Updating Glossarion...
+echo Waiting for current version to close...
+timeout /t 3 /nobreak > nul
+
+:: Delete the old executable
+echo Deleting old version...
+if exist "{current_exe}" (
+ del /f /q "{current_exe}"
+ if exist "{current_exe}" (
+ echo Failed to delete old version, retrying...
+ timeout /t 2 /nobreak > nul
+ del /f /q "{current_exe}"
+ )
+)
+
+:: Start the new version
+echo Starting new version...
+start "" "{update_file}"
+
+:: Clean up this batch file
+del "%~f0"
+"""
+ batch_path = os.path.join(current_dir, "update_glossarion.bat")
+ with open(batch_path, 'w') as f:
+ f.write(batch_content)
+
+ # Run the batch file
+ import subprocess
+ subprocess.Popen([batch_path], shell=True, creationflags=subprocess.CREATE_NO_WINDOW)
+
+ print(f"[DEBUG] Update batch file created: {batch_path}")
+ print(f"[DEBUG] Will delete: {current_exe}")
+ print(f"[DEBUG] Will start: {update_file}")
+ else:
+ # Running as script, just start the new exe
+ import subprocess
+ subprocess.Popen([update_file], shell=True)
+
+ # Exit current application
+ print("[DEBUG] Closing application for update...")
+ self.main_gui.master.quit()
+ sys.exit(0)
+
+ except Exception as e:
+ messagebox.showerror("Installation Error",
+ f"Could not start update process:\n{str(e)}")
diff --git a/wait_and_open.ps1 b/wait_and_open.ps1
new file mode 100644
index 0000000000000000000000000000000000000000..bff14e00948120107dc8dd8eb1e3e1213f2fdf9f
--- /dev/null
+++ b/wait_and_open.ps1
@@ -0,0 +1,31 @@
+# Wait for Gradio server to be ready and then open browser
+param(
+ [string]$url = "http://127.0.0.1:7860",
+ [int]$maxWaitSeconds = 60
+)
+
+Write-Host "Waiting for server to be ready at $url..." -ForegroundColor Cyan
+
+$startTime = Get-Date
+$ready = $false
+
+while (-not $ready -and ((Get-Date) - $startTime).TotalSeconds -lt $maxWaitSeconds) {
+ try {
+ $response = Invoke-WebRequest -Uri $url -Method Head -TimeoutSec 2 -UseBasicParsing -ErrorAction SilentlyContinue
+ if ($response.StatusCode -eq 200) {
+ $ready = $true
+ Write-Host "Server is ready!" -ForegroundColor Green
+ }
+ }
+ catch {
+ # Server not ready yet, wait a bit
+ Start-Sleep -Milliseconds 500
+ }
+}
+
+if ($ready) {
+ Write-Host "Opening browser..." -ForegroundColor Green
+ Start-Process $url
+} else {
+ Write-Host "Timeout waiting for server. Please open $url manually." -ForegroundColor Yellow
+}
\ No newline at end of file