Spaces:

frkhan
/

llm-web-scrapper

Running

App Files Files Community

llm-web-scrapper / llm_inference_service.py

frkhan

-- Introduced Guardrail against searching harmful content.

f4efd15 8 days ago

raw

history blame contribute delete

4.52 kB

	"""
	This module provides the service for interacting with Large Language Models (LLMs).

	It is responsible for initializing the Langfuse callback handler for tracing,
	constructing the appropriate prompt for information extraction, initializing the
	selected chat model, and invoking the model to get a response.
	"""

	from langchain.chat_models import init_chat_model
	from langfuse.langchain import CallbackHandler
	from langfuse import Langfuse

	from config import LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST

	# Initialize Langfuse client
	# This block sets up the Langfuse callback handler for LangChain.
	# It initializes the Langfuse client and creates a CallbackHandler instance
	# only if the required API keys are available. The handler is then added to
	# a list of callbacks that can be passed to LLM invocations for tracing.
	langfuse_callback_handler = None
	callbacks = []

	if LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY:
	Langfuse(
	public_key=LANGFUSE_PUBLIC_KEY,
	secret_key=LANGFUSE_SECRET_KEY,
	host=LANGFUSE_HOST,
	)
	langfuse_callback_handler = CallbackHandler()
	callbacks.append(langfuse_callback_handler)



	def extract_page_info_by_llm(user_query: str, scraped_markdown_content: str, model_name: str, model_provider: str) -> str:
	"""
	Extracts information from scraped content using a specified Large Language Model.

	This function constructs a detailed prompt, initializes the selected chat model,
	and invokes it with the scraped content and user query. If Langfuse is configured,
	it uses a callback handler to trace the LLM interaction.

	Args:
	user_query (str): The user's query specifying what information to extract.
	scraped_markdown_content (str): The markdown content from the scraped web page.
	model_name (str): The name of the LLM to use for extraction.
	model_provider (str): The provider of the LLM (e.g., 'google_genai', 'nvidia').

	Returns:
	str: The content of the LLM's response.
	"""

	if not scraped_markdown_content:
	return "No relevant information found to answer your question."

	context = scraped_markdown_content

	prompt = f"""
	You are an expert assistant who can extract useful information from the content provided to you. Most of the time,
	the content will be from e-commerce websites, and users will ask you to extract product information like product name, price, rating, etc.

	Safety Guardrails:
	You have a strict policy against processing harmful content. If the user's query or the provided context involves any of the following topics, you must strictly refuse to answer: adult content, NSFW, sexual topics (including nude or semi-nude magazines/websites), gambling, dark web, child assault, sex trafficking, or any other illegal activities. Instead, you must respond with only this exact message: "Warning: The requested content is inappropriate and violates the safety guidelines. This tool cannot be used for such purposes." Do not provide any other information.

	Please provide your identity (model name and provider if applicable) at the beginning of your answer.

	Use the following context to answer the user's question. Provide the final answer in a markdown table format if you are asked to extract product information.
	If you can't extract anything useful provide in plain markdown format.

	If user asks for JSON format, please provide the answer in JSON format only.

	User will mostly request you to extract product information but can also ask you to extract other information from the content.
	So always read the user query carefully and extract information accordingly.

	If you do not find or know the answer, do not hallucinate, do not try to generate fake answers.
	If no Context is given or you can't find or generate any relevant information to answer the question, simply state "No relevant information found to answer your question.
	If you think scraping was not done properly, please select a different scraper (FireCrawl or Crawl4AI) from the Select Scraper Dropdown and try again."

	Please do not respond with empty string / answer.

	Context:
	{context}

	Question:
	{user_query}

	Your Identity:

	Answer:

	"""

	llm = init_chat_model(model_name, model_provider=model_provider)
	response = llm.invoke(prompt, config={"callbacks": callbacks})
	return response.content