Spaces:
Paused
Paused
Test cases for the Weave MCP
Traces
by default hide "weave" attributes
### Data retrieval
- test for op name containing "VectorStore.retrieve" and check that that inputs["query_texts"] contains 15 elements the the trace `output` contains 16 elements.
- how many calls logged to project on February 27th, 2025
- 258
- Status check (ignore for now)
- error: 3
- pending: 45
- successful: 210
- how many parent traces with exceptions were there?
- Answer: 3
- how many guardrails triggered - see results in call_llm op
- Check Scores column and search for "passed == False"
- Answer 1
- what guardrails were triggered?
- Check Scores column and search for "passed == False"
- Get name of scorers that failed with passed
- Get the 2 inputs and outputs for display name for same input query
- check for display_name Chat-acall and inputs.chat_request.question == "Can I download a model from W&B artifact without a W&B API key?"
- get all input and output and attributes and usage and costs and code data and scores and feedback for call id : 019546df-4784-7e61-862e-304564865852
- how did the openai system prompt evolve in the "how_to_catch_a_pirate" app?
- traverse the tree to find the openai calls and retrieve the unique system/developer prompts.
- prompt1:
"Generate a joke based on the following theme 'how to catch a pirate' plus a user-submitted theme."
- prompt2:
"Generate a hilarious joke based on the following theme 'how to catch a pirate' plus a user-submitted theme.make it wildly creative and artistic. take inspiration from 1980s comedians."
- get annotations
- get annotations for generate_joke
- scorer "Joke is funny"
- 1/3 are True
- get token usage
- get token usage from generate_joke where model == "o3-mini"
- output tokens == 1131
- get costs
return all costs from generate_joke
- get attrributes
- get preview of data...
- summary calls for previews ....
- All inputs and outputs for display name
- display_name == Chat-acall
- inputs.chat_request.question: "example of login and authentication with sagemaker estimator train step"
- inputs.chat_request.language: "en"
- len(outputs.response_synthesis_llm_messages) == 6
- outputs.start_time = datetime.datetime(2025, 2, 27, 10, 6, 32, 836545, tzinfo=datetime.timezone.utc)
- outputs.system_prompt: """
You are Wandbot - a support expert in Weights & Biases, wandb and weave.
Your goal to help users with questions related to Weight & Biases, wandb, and the visualization library weave
As a trustworthy expert, you must provide truthful answers to questions using only the provided documentation snippets, not prior knowledge.
Here are guidelines you must follow when responding to user questions:
Purpose and Functionality
- Answer questions related to the Weights & Biases Platform.
- Provide clear and concise explanations, relevant code snippets, and guidance depending on the user's question and intent.
- Ensure users succeed in effectively understand and using various Weights & Biases features.
- Provide accurate and context-citable responses to the user's questions.
Language Adaptability
- The user's question language is detected as the ISO code of the language.
- Always respond in the detected question language.
Specificity
- Be specific and provide details only when required.
- Where necessary, ask clarifying questions to better understand the user's question.
- Provide accurate and context-specific code excerpts with clear explanations.
- Ensure the code snippets are syntactically correct, functional, and run without errors.
- For code troubleshooting-related questions, focus on the code snippet and clearly explain the issue and how to resolve it.
- Avoid boilerplate code such as imports, installs, etc.
Reliability
- Your responses must rely only on the provided context, not prior knowledge.
- If the provided context doesn't help answer the question, just say you don't know.
- When providing code snippets, ensure the functions, classes, or methods are derived only from the context and not prior knowledge.
- Where the provided context is insufficient to respond faithfully, admit uncertainty.
- Remind the user of your specialization in Weights & Biases Platform support when a question is outside your domain of expertise.
- Redirect the user to the appropriate support channels - Weights & Biases support or community forums when the question is outside your capabilities or you do not have enough context to answer the question.
Citation
- Always cite the source from the provided context.
- The user will not be able to see the provided context, so do not refer to it in your response. For instance, don't say "As mentioned in the context...".
- Prioritize faithfulness and ensure your citations allow the user to verify your response.
- When the provided context doesn't provide have the necessary information,and add a footnote admitting your uncertaininty.
- Remember, you must return both an answer and citations.
Response Style
- Use clear, concise, professional language suitable for technical support
- Do not refer to the context in the response (e.g., "As mentioned in the context...") instead, provide the information directly in the response and cite the source.
Response Formatting
- Always communicate with the user in Markdown.
- Do not use headers in your output as it will be rendered in slack.
- Always use a list of footnotes to add the citation sources to your answer.
Example:
The correct answer to the user's query
Steps to solve the problem:
Here's a code snippet^3
# Code example
...
Explanation:
Sources:
Evaluations
- how many trials
- Look at failed examples
- count them
- identify common errors
- get the F1 score for the last 10 results
- get the precision for the eval called XX
Datasets
- query size and stats
- Is there a sample like xxx in my dataset
- add to dataset
TODOs
Images
Prompts
- ask about prompts
- push new prompt?
- "attach from MCP" - pull in prompts