Files changed (1) hide show
  1. app.py +802 -305
app.py CHANGED
@@ -1,333 +1,830 @@
 
1
  import gradio as gr
2
- import numpy as np
3
- import random
4
- import torch
5
  import spaces
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- from PIL import Image
8
- from diffusers import QwenImageEditPlusPipeline
9
 
10
- import os
11
- import base64
12
- import json
13
-
14
- SYSTEM_PROMPT = '''
15
- # Edit Instruction Rewriter
16
- You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
17
-
18
- Please strictly follow the rewriting rules below:
19
-
20
- ## 1. General Principles
21
- - Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
22
- - If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
23
- - Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
24
- - All added objects or modifications must align with the logic and style of the scene in the input images.
25
- - If multiple sub-images are to be generated, describe the content of each sub-image individually.
26
-
27
- ## 2. Task-Type Handling Rules
28
-
29
- ### 1. Add, Delete, Replace Tasks
30
- - If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
31
- - If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
32
- > Original: "Add an animal"
33
- > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
34
- - Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
35
- - For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
36
-
37
- ### 2. Text Editing Tasks
38
- - All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
39
- - Both adding new text and replacing existing text are text replacement tasks, For example:
40
- - Replace "xx" to "yy"
41
- - Replace the mask / bounding box to "yy"
42
- - Replace the visual object to "yy"
43
- - Specify text position, color, and layout only if user has required.
44
- - If font is specified, keep the original language of the font.
45
-
46
- ### 3. Human Editing Tasks
47
- - Make the smallest changes to the given user's prompt.
48
- - If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
49
- - **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject’s identity consistency.**
50
- > Original: "Add eyebrows to the face"
51
- > Rewritten: "Slightly thicken the person’s eyebrows with little change, look natural."
52
-
53
- ### 4. Style Conversion or Enhancement Tasks
54
- - If a style is specified, describe it concisely using key visual features. For example:
55
- > Original: "Disco style"
56
- > Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
57
- - For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
58
- - **Colorization tasks (including old photo restoration) must use the fixed template:**
59
- "Restore and colorize the old photo."
60
- - Clearly specify the object to be modified. For example:
61
- > Original: Modify the subject in Picture 1 to match the style of Picture 2.
62
- > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
63
-
64
- ### 5. Material Replacement
65
- - Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
66
- - For text material replacement, use the fixed template:
67
- "Change the material of text "xxxx" to laser style"
68
-
69
- ### 6. Logo/Pattern Editing
70
- - Material replacement should preserve the original shape and structure as much as possible. For example:
71
- > Original: "Convert to sapphire material"
72
- > Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
73
- - When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
74
- > Original: "Migrate the logo in the image to a new scene"
75
- > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
76
-
77
- ### 7. Multi-Image Tasks
78
- - Rewritten prompts must clearly point out which image’s element is being modified. For example:
79
- > Original: "Replace the subject of picture 1 with the subject of picture 2"
80
- > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2’s background unchanged"
81
- - For stylization tasks, describe the reference image’s style in the rewritten prompt, while preserving the visual content of the source image.
82
-
83
- ## 3. Rationale and Logic Check
84
- - Resolve contradictory instructions: e.g., “Remove all trees but keep all trees” requires logical correction.
85
- - Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
86
-
87
- # Output Format Example
88
- ```json
89
- {
90
- "Rewritten": "..."
91
- }
92
- '''
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
- def polish_prompt(prompt, img):
95
- prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {prompt}\n\nRewritten Prompt:"
96
- success=False
97
- while not success:
 
 
98
  try:
99
- result = api(prompt, [img])
100
- # print(f"Result: {result}")
101
- # print(f"Polished Prompt: {polished_prompt}")
102
- if isinstance(result, str):
103
- result = result.replace('```json','')
104
- result = result.replace('```','')
105
- result = json.loads(result)
106
- else:
107
- result = json.loads(result)
108
-
109
- polished_prompt = result['Rewritten']
110
- polished_prompt = polished_prompt.strip()
111
- polished_prompt = polished_prompt.replace("\n", " ")
112
- success = True
 
 
 
 
 
 
 
 
 
 
 
113
  except Exception as e:
114
- print(f"[Warning] Error during API call: {e}")
115
- return polished_prompt
116
-
117
-
118
- def encode_image(pil_image):
119
- import io
120
- buffered = io.BytesIO()
121
- pil_image.save(buffered, format="PNG")
122
- return base64.b64encode(buffered.getvalue()).decode("utf-8")
123
-
124
-
125
-
126
-
127
- def api(prompt, img_list, model="qwen-vl-max-latest", kwargs={}):
128
- import dashscope
129
- api_key = os.environ.get('DASH_API_KEY')
130
- if not api_key:
131
- raise EnvironmentError("DASH_API_KEY is not set")
132
- assert model in ["qwen-vl-max-latest"], f"Not implemented model {model}"
133
- sys_promot = "you are a helpful assistant, you should provide useful answers to users."
134
- messages = [
135
- {"role": "system", "content": sys_promot},
136
- {"role": "user", "content": []}]
137
- for img in img_list:
138
- messages[1]["content"].append(
139
- {"image": f"data:image/png;base64,{encode_image(img)}"})
140
- messages[1]["content"].append({"text": f"{prompt}"})
141
-
142
- response_format = kwargs.get('response_format', None)
143
-
144
- response = dashscope.MultiModalConversation.call(
145
- api_key=api_key,
146
- model=model, # For example, use qwen-plus here. You can change the model name as needed. Model list: https://help.aliyun.com/zh/model-studio/getting-started/models
147
- messages=messages,
148
- result_format='message',
149
- response_format=response_format,
 
 
 
 
 
 
 
 
 
 
150
  )
151
 
152
- if response.status_code == 200:
153
- return response.output.choices[0].message.content[0]['text']
154
- else:
155
- raise Exception(f'Failed to post: {response}')
156
-
157
- # --- Model Loading ---
158
- dtype = torch.bfloat16
159
- device = "cuda" if torch.cuda.is_available() else "cpu"
160
-
161
- # Load the model pipeline
162
- pipe = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2509", torch_dtype=dtype).to(device)
163
-
164
- # --- UI Constants and Helpers ---
165
- MAX_SEED = np.iinfo(np.int32).max
166
-
167
- # --- Main Inference Function (with hardcoded negative prompt) ---
168
- @spaces.GPU(duration=300)
169
- def infer(
170
- images,
171
- prompt,
172
- seed=42,
173
- randomize_seed=False,
174
- true_guidance_scale=1.0,
175
- num_inference_steps=50,
176
- height=None,
177
- width=None,
178
- rewrite_prompt=True,
179
- num_images_per_prompt=1,
180
- progress=gr.Progress(track_tqdm=True),
181
  ):
182
- """
183
- Generates an image using the local Qwen-Image diffusers pipeline.
184
- """
185
- # Hardcode the negative prompt as requested
186
- negative_prompt = " "
187
-
188
- if randomize_seed:
189
- seed = random.randint(0, MAX_SEED)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
- # Set up the generator for reproducibility
192
- generator = torch.Generator(device=device).manual_seed(seed)
193
-
194
- # Load input images into PIL Images
195
- pil_images = []
196
- if images is not None:
197
- for item in images:
198
- try:
199
- if isinstance(item[0], Image.Image):
200
- pil_images.append(item[0].convert("RGB"))
201
- elif isinstance(item[0], str):
202
- pil_images.append(Image.open(item[0]).convert("RGB"))
203
- elif hasattr(item, "name"):
204
- pil_images.append(Image.open(item.name).convert("RGB"))
205
- except Exception:
206
- continue
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207
 
208
- if height==256 and width==256:
209
- height, width = None, None
210
- print(f"Calling pipeline with prompt: '{prompt}'")
211
- print(f"Negative Prompt: '{negative_prompt}'")
212
- print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
213
- if rewrite_prompt and len(pil_images) > 0:
214
- prompt = polish_prompt(prompt, pil_images[0])
215
- print(f"Rewritten Prompt: {prompt}")
216
-
217
 
218
- # Generate the image
219
- image = pipe(
220
- image=pil_images if len(pil_images) > 0 else None,
221
- prompt=prompt,
222
- height=height,
223
- width=width,
224
- negative_prompt=negative_prompt,
225
- num_inference_steps=num_inference_steps,
226
- generator=generator,
227
- true_cfg_scale=true_guidance_scale,
228
- num_images_per_prompt=num_images_per_prompt,
229
- ).images
230
-
231
- return image, seed
232
-
233
- # --- Examples and UI Layout ---
234
- examples = []
235
-
236
- css = """
237
- #col-container {
238
- margin: 0 auto;
239
- max-width: 1024px;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
  }
241
- #edit_text{margin-top: -62px !important}
242
  """
243
 
244
- with gr.Blocks(css=css) as demo:
245
- with gr.Column(elem_id="col-container"):
246
- gr.HTML('<img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" alt="Qwen-Image Logo" width="400" style="display: block; margin: 0 auto;">')
247
- gr.Markdown("[Learn more](https://github.com/QwenLM/Qwen-Image) about the Qwen-Image series. Try on [Qwen Chat](https://chat.qwen.ai/), or [download model](https://huggingface.co/Qwen/Qwen-Image-Edit) to run locally with ComfyUI or diffusers.")
248
- with gr.Row():
249
- with gr.Column():
250
- input_images = gr.Gallery(label="Input Images", show_label=False, type="pil", interactive=True)
251
-
252
- # result = gr.Image(label="Result", show_label=False, type="pil")
253
- result = gr.Gallery(label="Result", show_label=False, type="pil")
254
- with gr.Row():
255
- prompt = gr.Text(
256
- label="Prompt",
257
- show_label=False,
258
- placeholder="describe the edit instruction",
259
- container=False,
 
 
 
 
 
 
 
 
 
 
 
 
260
  )
261
- run_button = gr.Button("Edit!", variant="primary")
262
-
263
- with gr.Accordion("Advanced Settings", open=False):
264
- # Negative prompt UI element is removed here
265
 
266
- seed = gr.Slider(
267
- label="Seed",
268
- minimum=0,
269
- maximum=MAX_SEED,
270
- step=1,
271
- value=0,
272
  )
273
 
274
- randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
275
-
 
276
  with gr.Row():
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
277
 
278
- true_guidance_scale = gr.Slider(
279
- label="True guidance scale",
280
- minimum=1.0,
281
- maximum=10.0,
282
- step=0.1,
283
- value=4.0
284
- )
285
 
286
- num_inference_steps = gr.Slider(
287
- label="Number of inference steps",
288
- minimum=1,
289
- maximum=50,
290
- step=1,
291
- value=40,
292
- )
293
-
294
- height = gr.Slider(
295
- label="Height",
296
- minimum=256,
297
- maximum=2048,
298
- step=8,
299
- value=None,
300
- )
301
 
302
- width = gr.Slider(
303
- label="Width",
304
- minimum=256,
305
- maximum=2048,
306
- step=8,
307
- value=None,
308
- )
309
 
 
 
 
 
310
 
311
- rewrite_prompt = gr.Checkbox(label="Rewrite prompt", value=False)
312
-
313
- # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
314
-
315
- gr.on(
316
- triggers=[run_button.click, prompt.submit],
317
- fn=infer,
318
- inputs=[
319
- input_images,
320
- prompt,
321
- seed,
322
- randomize_seed,
323
- true_guidance_scale,
324
- num_inference_steps,
325
- height,
326
- width,
327
- rewrite_prompt,
328
- ],
329
- outputs=[result, seed],
330
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
331
 
332
  if __name__ == "__main__":
333
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
  import gradio as gr
 
 
 
3
  import spaces
4
+ from infer_rvc_python import BaseLoader
5
+ import random
6
+ import logging
7
+ import time
8
+ import soundfile as sf
9
+ from infer_rvc_python.main import download_manager, load_hu_bert, Config
10
+ import zipfile
11
+ import edge_tts
12
+ import asyncio
13
+ import librosa
14
+ import traceback
15
+ import soundfile as sf
16
+ from pedalboard import Pedalboard, Reverb, Compressor, HighpassFilter
17
+ from pedalboard.io import AudioFile
18
+ from pydub import AudioSegment
19
+ import noisereduce as nr
20
+ import numpy as np
21
+ import urllib.request
22
+ import shutil
23
+ import threading
24
+ import argparse
25
+ import sys
26
+
27
+ parser = argparse.ArgumentParser(description="Run the app with optional sharing")
28
+ parser.add_argument(
29
+ '--share',
30
+ action='store_true',
31
+ help='Enable sharing mode'
32
+ )
33
+ parser.add_argument(
34
+ '--theme',
35
+ type=str,
36
+ default="aliabid94/new-theme",
37
+ help='Set the theme (default: aliabid94/new-theme)'
38
+ )
39
+ args = parser.parse_args()
40
+
41
+ IS_COLAB = True if ('google.colab' in sys.modules or args.share) else False
42
+ IS_ZERO_GPU = os.getenv("SPACES_ZERO_GPU")
43
+
44
+ logging.getLogger("infer_rvc_python").setLevel(logging.ERROR)
45
+
46
+ converter = BaseLoader(only_cpu=False, hubert_path=None, rmvpe_path=None)
47
+ converter.hu_bert_model = load_hu_bert(Config(only_cpu=False), converter.hubert_path)
48
+
49
+ # مدل پیش‌فرض (اختیاری)
50
+ test_model = "https://huggingface.co/sail-rvc/Aldeano_Minecraft__RVC_V2_-_500_Epochs_/resolve/main/model.pth?download=true, https://huggingface.co/sail-rvc/Aldeano_Minecraft__RVC_V2_-_500_Epochs_/resolve/main/model.index?download=true"
51
+ test_names = ["model.pth", "model.index"]
52
+
53
+ for url, filename in zip(test_model.split(", "), test_names):
54
+ try:
55
+ download_manager(
56
+ url=url,
57
+ path=".",
58
+ extension="",
59
+ overwrite=False,
60
+ progress=True,
61
+ )
62
+ if not os.path.isfile(filename):
63
+ raise FileNotFoundError
64
+ except Exception:
65
+ with open(filename, "wb") as f:
66
+ pass
67
+
68
+ title = "<center><strong><font size='7'>RVC⚡ZERO - High Quality Voice Conversion</font></strong></center>"
69
+ description = "Upload your own model (.pth) and audio files for voice conversion." if IS_ZERO_GPU else ""
70
+ RESOURCES = """
71
+ 📌 <strong>Tips for Best Quality:</strong>
72
+ - Use models trained for 200+ epochs.
73
+ - Always upload .index file & set Index Influence to 0.9.
74
+ - Choose "rmvpe+" as Pitch Algorithm.
75
+ - Output format: WAV (lossless).
76
+ - Disable noise reduction unless necessary.
77
+ - Keep Resample SR = 0 (automatic).
78
+ """
79
+ theme = args.theme
80
+ delete_cache_time = (3200, 3200) if IS_ZERO_GPU else (86400, 86400)
81
+
82
+ PITCH_ALGO_OPT = [
83
+ "pm",
84
+ "harvest",
85
+ "crepe",
86
+ "rmvpe",
87
+ "rmvpe+",
88
+ ]
89
+
90
+
91
+ async def get_voices_list(proxy=None):
92
+ """Print all available voices."""
93
+ from edge_tts import list_voices
94
+ voices = await list_voices(proxy=proxy)
95
+ voices = sorted(voices, key=lambda voice: voice.get("ShortName", ""))
96
+
97
+ table = [
98
+ {
99
+ "ShortName": voice.get("ShortName", "Unknown"),
100
+ "Gender": voice.get("Gender", "Unknown"),
101
+ "ContentCategories": ", ".join(voice.get("VoiceTag", {}).get("ContentCategories", [])),
102
+ "VoicePersonalities": ", ".join(voice.get("VoiceTag", {}).get("VoicePersonalities", [])),
103
+ "FriendlyName": voice.get("FriendlyName", voice.get("Name", "Unknown Voice")),
104
+ }
105
+ for voice in voices
106
+ ]
107
+
108
+ return table
109
+
110
+
111
+ def find_files(directory):
112
+ file_paths = []
113
+ for filename in os.listdir(directory):
114
+ if filename.endswith('.pth') or filename.endswith('.zip') or filename.endswith('.index'):
115
+ file_paths.append(os.path.join(directory, filename))
116
+ return file_paths
117
+
118
+
119
+ def unzip_in_folder(my_zip, my_dir):
120
+ with zipfile.ZipFile(my_zip) as zip:
121
+ for zip_info in zip.infolist():
122
+ if zip_info.is_dir():
123
+ continue
124
+ zip_info.filename = os.path.basename(zip_info.filename)
125
+ zip.extract(zip_info, my_dir)
126
 
 
 
127
 
128
+ def find_my_model(a_, b_):
129
+ if a_ is None or a_.endswith(".pth"):
130
+ return a_, b_
131
+
132
+ txt_files = []
133
+ for base_file in [a_, b_]:
134
+ if base_file is not None and base_file.endswith(".txt"):
135
+ txt_files.append(base_file)
136
+
137
+ directory = os.path.dirname(a_)
138
+
139
+ for txt in txt_files:
140
+ with open(txt, 'r') as file:
141
+ first_line = file.readline()
142
+
143
+ download_manager(
144
+ url=first_line.strip(),
145
+ path=directory,
146
+ extension="",
147
+ )
148
+
149
+ for f in find_files(directory):
150
+ if f.endswith(".zip"):
151
+ unzip_in_folder(f, directory)
152
+
153
+ model = None
154
+ index = None
155
+ end_files = find_files(directory)
156
+
157
+ for ff in end_files:
158
+ if ff.endswith(".pth"):
159
+ model = os.path.join(directory, ff)
160
+ gr.Info(f"Model found: {ff}")
161
+ if ff.endswith(".index"):
162
+ index = os.path.join(directory, ff)
163
+ gr.Info(f"Index found: {ff}")
164
+
165
+ if not model:
166
+ gr.Error(f"Model not found in: {end_files}")
167
+
168
+ if not index:
169
+ gr.Warning("Index not found")
170
+
171
+ return model, index
172
+
173
+
174
+ def ensure_valid_file(url):
175
+ if "huggingface" not in url:
176
+ raise ValueError("Only downloads from Hugging Face are allowed")
177
+
178
+ try:
179
+ request = urllib.request.Request(url, method="HEAD")
180
+ with urllib.request.urlopen(request) as response:
181
+ content_length = response.headers.get("Content-Length")
182
+
183
+ if content_length is None:
184
+ raise ValueError("No Content-Length header found")
185
+
186
+ file_size = int(content_length)
187
+ if file_size > 900000000 and IS_ZERO_GPU:
188
+ raise ValueError("The file is too large. Max allowed is 900 MB.")
189
+
190
+ return file_size
191
+
192
+ except Exception as e:
193
+ raise e
194
+
195
+
196
+ def clear_files(directory):
197
+ time.sleep(15)
198
+ print(f"Clearing files: {directory}.")
199
+ shutil.rmtree(directory)
200
+
201
+
202
+ # تابع کاملاً اصلاح شده بدون خطای سینتکسی
203
+ def get_my_model(url_data, progress=gr.Progress(track_tqdm=True)):
204
+ if not url_data: # ⬅️ اصلاح شده: url_data + :
205
+ return None, None
206
+
207
+ if "," in url_data: # ⬅️ اصلاح شده: url_data + :
208
+ a_, b_ = url_data.split(",")
209
+ a_, b_ = a_.strip().replace("/blob/", "/resolve/"), b_.strip().replace("/blob/", "/resolve/")
210
+ else:
211
+ a_, b_ = url_data.strip().replace("/blob/", "/resolve/"), None
212
+
213
+ out_dir = "downloads"
214
+ folder_download = str(random.randint(1000, 9999))
215
+ directory = os.path.join(out_dir, folder_download)
216
+ os.makedirs(directory, exist_ok=True)
217
+
218
+ try:
219
+ valid_url = [a_] if not b_ else [a_, b_]
220
+ for link in valid_url:
221
+ ensure_valid_file(link)
222
+ download_manager(
223
+ url=link,
224
+ path=directory,
225
+ extension="",
226
+ )
227
+
228
+ for f in find_files(directory):
229
+ if f.endswith(".zip"):
230
+ unzip_in_folder(f, directory)
231
+
232
+ model = None
233
+ index = None
234
+ end_files = find_files(directory)
235
+
236
+ for ff in end_files:
237
+ if ff.endswith(".pth"):
238
+ model = ff
239
+ gr.Info(f"Model found: {ff}")
240
+ if ff.endswith(".index"):
241
+ index = ff
242
+ gr.Info(f"Index found: {ff}")
243
+
244
+ if not model:
245
+ raise ValueError(f"Model not found in: {end_files}")
246
+
247
+ if not index:
248
+ gr.Warning("Index not found")
249
+ else:
250
+ index = os.path.abspath(index)
251
+
252
+ return os.path.abspath(model), index
253
+
254
+ except Exception as e:
255
+ raise e
256
+ finally:
257
+ t = threading.Thread(target=clear_files, args=(directory,))
258
+ t.start()
259
 
260
+
261
+ def add_audio_effects(audio_list, type_output):
262
+ print("Audio effects")
263
+
264
+ result = []
265
+ for audio_path in audio_list:
266
  try:
267
+ output_path = f'{os.path.splitext(audio_path)[0]}_effects.{type_output}'
268
+
269
+ board = Pedalboard(
270
+ [
271
+ HighpassFilter(),
272
+ Compressor(ratio=4, threshold_db=-15),
273
+ Reverb(room_size=0.10, dry_level=0.8, wet_level=0.2, damping=0.7)
274
+ ]
275
+ )
276
+
277
+ temp_wav = f'{os.path.splitext(audio_path)[0]}_temp.wav'
278
+
279
+ with AudioFile(audio_path) as f:
280
+ with AudioFile(temp_wav, 'w', f.samplerate, f.num_channels) as o:
281
+ while f.tell() < f.frames:
282
+ chunk = f.read(int(f.samplerate))
283
+ effected = board(chunk, f.samplerate, reset=False)
284
+ o.write(effected)
285
+
286
+ audio_seg = AudioSegment.from_file(temp_wav, format=type_output)
287
+ audio_seg.export(output_path, format=type_output, bitrate=("320k" if type_output == "mp3" else None))
288
+
289
+ os.remove(temp_wav)
290
+
291
+ result.append(output_path)
292
  except Exception as e:
293
+ traceback.print_exc()
294
+ print(f"Error audio effects: {str(e)}")
295
+ result.append(audio_path)
296
+
297
+ return result
298
+
299
+
300
+ def apply_noisereduce(audio_list, type_output):
301
+ print("Noise reduce")
302
+
303
+ result = []
304
+ for audio_path in audio_list:
305
+ out_path = f"{os.path.splitext(audio_path)[0]}_noisereduce.{type_output}"
306
+
307
+ try:
308
+ audio = AudioSegment.from_file(audio_path)
309
+ samples = np.array(audio.get_array_of_samples())
310
+ reduced_noise = nr.reduce_noise(samples, sr=audio.frame_rate, prop_decrease=0.6)
311
+
312
+ reduced_audio = AudioSegment(
313
+ reduced_noise.tobytes(),
314
+ frame_rate=audio.frame_rate,
315
+ sample_width=audio.sample_width,
316
+ channels=audio.channels
317
+ )
318
+
319
+ reduced_audio.export(out_path, format=type_output, bitrate=("320k" if type_output == "mp3" else None))
320
+ result.append(out_path)
321
+
322
+ except Exception as e:
323
+ traceback.print_exc()
324
+ print(f"Error noisereduce: {str(e)}")
325
+ result.append(audio_path)
326
+
327
+ return result
328
+
329
+
330
+ @spaces.GPU()
331
+ def convert_now(audio_files, random_tag, converter, type_output, steps):
332
+ for step in range(steps):
333
+ audio_files = converter(
334
+ audio_files,
335
+ random_tag,
336
+ overwrite=False,
337
+ parallel_workers=(2 if IS_COLAB else 8),
338
+ type_output=type_output,
339
  )
340
 
341
+ return audio_files
342
+
343
+
344
+ def run(
345
+ audio_files,
346
+ file_m,
347
+ pitch_alg,
348
+ pitch_lvl,
349
+ file_index,
350
+ index_inf,
351
+ r_m_f,
352
+ e_r,
353
+ c_b_p,
354
+ active_noise_reduce,
355
+ audio_effects,
356
+ type_output,
357
+ steps,
 
 
 
 
 
 
 
 
 
 
 
 
358
  ):
359
+ if not audio_files:
360
+ raise ValueError("Please upload audio files")
361
+
362
+ if isinstance(audio_files, str):
363
+ audio_files = [audio_files]
364
+
365
+ try:
366
+ duration_base = librosa.get_duration(filename=audio_files[0])
367
+ print("Duration:", duration_base)
368
+ except Exception as e:
369
+ print(e)
370
+
371
+ if file_m is not None and file_m.endswith(".txt"):
372
+ file_m, file_index = find_my_model(file_m, file_index)
373
+ print(file_m, file_index)
374
+
375
+ random_tag = "USER_"+str(random.randint(10000000, 99999999))
376
+
377
+ converter.apply_conf(
378
+ tag=random_tag,
379
+ file_model=file_m,
380
+ pitch_algo=pitch_alg,
381
+ pitch_lvl=pitch_lvl,
382
+ file_index=file_index,
383
+ index_influence=index_inf,
384
+ respiration_median_filtering=r_m_f,
385
+ envelope_ratio=e_r,
386
+ consonant_breath_protection=c_b_p,
387
+ resample_sr=0, # ⬅️ مهم: بدون ری‌سمپل برای کیفیت بالاتر
388
+ )
389
+ time.sleep(0.1)
390
 
391
+ result = convert_now(audio_files, random_tag, converter, type_output, steps)
392
+
393
+ if active_noise_reduce:
394
+ result = apply_noisereduce(result, type_output)
395
+
396
+ if audio_effects:
397
+ result = add_audio_effects(result, type_output)
398
+
399
+ return result
400
+
401
+
402
+ def audio_conf():
403
+ return gr.File(
404
+ label="Upload Audio Files (wav, mp3, ogg, flac)",
405
+ file_count="multiple",
406
+ type="filepath",
407
+ file_types=[".wav", ".mp3", ".ogg", ".flac", ".m4a"],
408
+ container=True,
409
+ )
410
+
411
+
412
+ def model_conf():
413
+ return gr.File(
414
+ label="Upload Model File (.pth)",
415
+ type="filepath",
416
+ file_types=[".pth"],
417
+ height=130,
418
+ )
419
+
420
+
421
+ def pitch_algo_conf():
422
+ return gr.Dropdown(
423
+ PITCH_ALGO_OPT,
424
+ value="rmvpe+", # ⬅️ بهترین الگوریتم برای کیفیت
425
+ label="Pitch Algorithm (rmvpe+ recommended)",
426
+ visible=True,
427
+ interactive=True,
428
+ )
429
+
430
+
431
+ def pitch_lvl_conf():
432
+ return gr.Slider(
433
+ label="Pitch Shift (نازک/کلفت کردن صدا)",
434
+ minimum=-24,
435
+ maximum=24,
436
+ step=1,
437
+ value=0,
438
+ visible=True,
439
+ interactive=True,
440
+ info="🔹 مثبت = نازک‌تر (مثل کارتون) | منفی = کلفت‌تر (مثل غول)"
441
+ )
442
+
443
+
444
+ def index_conf():
445
+ return gr.File(
446
+ label="Upload Index File (.index) - Optional (Recommended for Quality!)",
447
+ type="filepath",
448
+ file_types=[".index"],
449
+ height=130,
450
+ )
451
 
 
 
 
 
 
 
 
 
 
452
 
453
+ def index_inf_conf():
454
+ return gr.Slider(
455
+ minimum=0,
456
+ maximum=1,
457
+ label="Index Influence (Higher = More Detail)",
458
+ value=0.9, # ⬅️ بهینه برای کیفیت
459
+ )
460
+
461
+
462
+ def respiration_filter_conf():
463
+ return gr.Slider(
464
+ minimum=0,
465
+ maximum=7,
466
+ label="Respiration Median Filtering",
467
+ value=3,
468
+ step=1,
469
+ interactive=True,
470
+ )
471
+
472
+
473
+ def envelope_ratio_conf():
474
+ return gr.Slider(
475
+ minimum=0,
476
+ maximum=1,
477
+ label="Envelope Ratio (Controls Dynamics)",
478
+ value=0.5, # ⬅️ بهینه برای طبیعی‌بودن
479
+ interactive=True,
480
+ )
481
+
482
+
483
+ def consonant_protec_conf():
484
+ return gr.Slider(
485
+ minimum=0,
486
+ maximum=0.5,
487
+ label="Consonant Breath Protection",
488
+ value=0.3, # ⬅️ کاهش برای جلوگیری از مصنوعی شدن
489
+ interactive=True,
490
+ )
491
+
492
+
493
+ def button_conf():
494
+ return gr.Button(
495
+ "Convert Voice (High Quality Mode)",
496
+ variant="primary",
497
+ size="lg",
498
+ )
499
+
500
+
501
+ def output_conf():
502
+ return gr.File(
503
+ label="Converted Audio (High Quality Output)",
504
+ file_count="multiple",
505
+ interactive=False,
506
+ )
507
+
508
+
509
+ def active_tts_conf():
510
+ return gr.Checkbox(
511
+ False,
512
+ label="Use Text-to-Speech",
513
+ container=False,
514
+ )
515
+
516
+
517
+ def tts_voice_conf():
518
+ return gr.Dropdown(
519
+ label="TTS Voice",
520
+ choices=[], # Will be populated later
521
+ visible=False,
522
+ value=None,
523
+ )
524
+
525
+
526
+ def tts_text_conf():
527
+ return gr.Textbox(
528
+ value="",
529
+ placeholder="Enter text to convert to speech...",
530
+ label="Text",
531
+ visible=False,
532
+ lines=3,
533
+ )
534
+
535
+
536
+ def tts_button_conf():
537
+ return gr.Button(
538
+ "Generate Speech",
539
+ variant="secondary",
540
+ visible=False,
541
+ )
542
+
543
+
544
+ def tts_play_conf():
545
+ return gr.Checkbox(
546
+ False,
547
+ label="Auto-play generated audio",
548
+ container=False,
549
+ visible=False,
550
+ )
551
+
552
+
553
+ def sound_gui():
554
+ return gr.Audio(
555
+ value=None,
556
+ type="filepath",
557
+ autoplay=True,
558
+ visible=True,
559
+ interactive=False,
560
+ elem_id="audio_tts",
561
+ )
562
+
563
+
564
+ def steps_conf():
565
+ return gr.Slider(
566
+ minimum=1,
567
+ maximum=3,
568
+ label="Conversion Steps (1 recommended for speed & quality)",
569
+ value=1,
570
+ step=1,
571
+ interactive=True,
572
+ )
573
+
574
+
575
+ def format_output_gui():
576
+ return gr.Dropdown(
577
+ label="Output Format (WAV for Best Quality)",
578
+ choices=["wav", "flac", "mp3"],
579
+ value="wav", # ⬅️ فرمت بدون فشرده‌سازی
580
+ )
581
+
582
+
583
+ def denoise_conf():
584
+ return gr.Checkbox(
585
+ False, # ⬅️ پیش‌فرض غیرفعال — فقط در صورت نیاز فعال شود
586
+ label="Apply Noise Reduction (May reduce quality)",
587
+ container=False,
588
+ visible=True,
589
+ )
590
+
591
+
592
+ def effects_conf():
593
+ return gr.Checkbox(
594
+ False, # ⬅️ پیش‌فرض غیرفعال
595
+ label="Apply Audio Effects (Reverb) (May reduce clarity)",
596
+ container=False,
597
+ visible=True,
598
+ )
599
+
600
+
601
+ def infer_tts_audio(tts_voice, tts_text, play_tts):
602
+ out_dir = "output"
603
+ folder_tts = "USER_"+str(random.randint(10000, 99999))
604
+
605
+ os.makedirs(out_dir, exist_ok=True)
606
+ os.makedirs(os.path.join(out_dir, folder_tts), exist_ok=True)
607
+ out_path = os.path.join(out_dir, folder_tts, "tts.mp3")
608
+
609
+ # Extract ShortName from combined value (e.g., "en-US-EmmaMultilingualNeural-Female")
610
+ if tts_voice:
611
+ short_name = "-".join(tts_voice.split('-')[:-1])
612
+ else:
613
+ short_name = "en-US-EmmaMultilingualNeural"
614
+
615
+ asyncio.run(edge_tts.Communicate(tts_text, short_name).save(out_path))
616
+ if play_tts:
617
+ return [out_path], out_path
618
+ return [out_path], None
619
+
620
+
621
+ def show_components_tts(value_active):
622
+ return gr.update(
623
+ visible=value_active
624
+ ), gr.update(
625
+ visible=value_active
626
+ ), gr.update(
627
+ visible=value_active
628
+ ), gr.update(
629
+ visible=value_active
630
+ )
631
+
632
+
633
+ def down_active_conf():
634
+ return gr.Checkbox(
635
+ False,
636
+ label="Download from URL",
637
+ container=False,
638
+ )
639
+
640
+
641
+ def down_url_conf():
642
+ return gr.Textbox(
643
+ value="",
644
+ placeholder="Hugging Face model URL...",
645
+ label="Model URL",
646
+ visible=False,
647
+ lines=1,
648
+ )
649
+
650
+
651
+ def down_button_conf():
652
+ return gr.Button(
653
+ "Download Model",
654
+ variant="secondary",
655
+ visible=False,
656
+ )
657
+
658
+
659
+ def show_components_down(value_active):
660
+ return gr.update(
661
+ visible=value_active
662
+ ), gr.update(
663
+ visible=value_active
664
+ ), gr.update(
665
+ visible=value_active
666
+ )
667
+
668
+ CSS = """
669
+ #audio_tts {
670
+ visibility: hidden;
671
+ height: 0px;
672
+ width: 0px;
673
+ max-width: 0px;
674
+ max-height: 0px;
675
  }
 
676
  """
677
 
678
+ def get_gui(theme):
679
+ with gr.Blocks(theme=theme, css=CSS, fill_width=True, fill_height=False, delete_cache=delete_cache_time) as app:
680
+ gr.Markdown(title)
681
+ gr.Markdown(description)
682
+
683
+ with gr.Tab("Voice Conversion"):
684
+ # بخش آپلود فایل‌های صوتی
685
+ gr.Markdown("### 📤 Upload Audio Files")
686
+ aud = audio_conf()
687
+
688
+ # بخش TTS
689
+ active_tts = active_tts_conf()
690
+ with gr.Row(visible=False) as tts_row:
691
+ with gr.Column(scale=1):
692
+ tts_text = tts_text_conf()
693
+ with gr.Column(scale=2):
694
+ with gr.Row():
695
+ with gr.Column():
696
+ with gr.Row():
697
+ tts_voice = tts_voice_conf()
698
+ tts_active_play = tts_play_conf()
699
+ tts_button = tts_button_conf()
700
+ tts_play = sound_gui()
701
+
702
+ active_tts.change(
703
+ fn=show_components_tts,
704
+ inputs=[active_tts],
705
+ outputs=[tts_voice, tts_text, tts_button, tts_active_play],
706
  )
 
 
 
 
707
 
708
+ tts_button.click(
709
+ fn=infer_tts_audio,
710
+ inputs=[tts_voice, tts_text, tts_active_play],
711
+ outputs=[aud, tts_play],
 
 
712
  )
713
 
714
+ # بخش مدل
715
+ gr.Markdown("### 🎯 Model Selection")
716
+
717
  with gr.Row():
718
+ with gr.Column(scale=1):
719
+ model = model_conf()
720
+ gr.Markdown("*Upload your .pth model file*")
721
+ with gr.Column(scale=1):
722
+ indx = index_conf()
723
+ gr.Markdown("*Upload .index file for best quality!*")
724
+
725
+ # بخش دانلود از URL
726
+ down_active_gui = down_active_conf()
727
+ down_info = gr.Markdown(
728
+ f"Download models from Hugging Face URLs",
729
+ visible=False
730
+ )
731
+ with gr.Row(visible=False) as url_row:
732
+ with gr.Column(scale=3):
733
+ down_url_gui = down_url_conf()
734
+ with gr.Column(scale=1):
735
+ down_button_gui = down_button_conf()
736
+
737
+ down_active_gui.change(
738
+ show_components_down,
739
+ [down_active_gui],
740
+ [down_info, down_url_gui, down_button_gui]
741
+ )
742
 
743
+ down_button_gui.click(
744
+ get_my_model,
745
+ [down_url_gui],
746
+ [model, indx]
747
+ )
 
 
748
 
749
+ # تنظیمات پیشرفته
750
+ with gr.Accordion(label="⚙️ Advanced Settings (Optimized for Quality)", open=True):
751
+ with gr.Row():
752
+ algo = pitch_algo_conf()
753
+ algo_lvl = pitch_lvl_conf()
 
 
 
 
 
 
 
 
 
 
754
 
755
+ with gr.Row():
756
+ indx_inf = index_inf_conf()
757
+ steps_gui = steps_conf()
 
 
 
 
758
 
759
+ with gr.Row():
760
+ res_fc = respiration_filter_conf()
761
+ envel_r = envelope_ratio_conf()
762
+ const = consonant_protec_conf()
763
 
764
+ with gr.Row():
765
+ format_out = format_output_gui()
766
+ denoise_gui = denoise_conf()
767
+ effects_gui = effects_conf()
768
+
769
+ # دکمه تبدیل
770
+ button_base = button_conf()
771
+
772
+ # نتیجه
773
+ gr.Markdown("### 🎵 Output (High Quality)")
774
+ output_base = output_conf()
775
+
776
+ button_base.click(
777
+ run,
778
+ inputs=[
779
+ aud,
780
+ model,
781
+ algo,
782
+ algo_lvl,
783
+ indx,
784
+ indx_inf,
785
+ res_fc,
786
+ envel_r,
787
+ const,
788
+ denoise_gui,
789
+ effects_gui,
790
+ format_out,
791
+ steps_gui,
792
+ ],
793
+ outputs=[output_base],
794
+ )
795
+
796
+ gr.Markdown(RESOURCES)
797
+
798
+ return app
799
+
800
 
801
  if __name__ == "__main__":
802
+ # Get voice list safely
803
+ tts_voice_list = asyncio.new_event_loop().run_until_complete(get_voices_list(proxy=None))
804
+
805
+ # Build voice dropdown options with safe .get() access
806
+ voices = sorted([
807
+ (
808
+ " - ".join(
809
+ reversed(
810
+ voice.get("FriendlyName", voice.get("Name", "Unknown Voice")).split("-")
811
+ )
812
+ ).replace("Microsoft ", "").replace("Online (Natural)", f"({voice.get('Gender', 'Unknown')})").strip(),
813
+ f"{voice.get('ShortName', 'Unknown')}-{voice.get('Gender', 'Unknown')}"
814
+ )
815
+ for voice in tts_voice_list
816
+ ])
817
+
818
+ # Initialize GUI
819
+ app = get_gui(theme)
820
+ app.queue(default_concurrency_limit=40)
821
+
822
+ # Launch app
823
+ app.launch(
824
+ max_threads=40,
825
+ share=IS_COLAB,
826
+ show_error=True,
827
+ quiet=False,
828
+ debug=IS_COLAB,
829
+ ssr_mode=False,
830
+ )