mcp-server

Paused

App Files Files Community

mcp-server / tests /tests_descriptions.md

NiWaRe

mcp_base

f647629 3 months ago

preview code

raw

history blame

6.76 kB

Test cases for the Weave MCP

Traces

by default hide "weave" attributes

### Data retrieval
- test for op name containing "VectorStore.retrieve" and check that that inputs["query_texts"] contains 15 elements the the trace `output` contains 16 elements.
- how many calls logged to project on February 27th, 2025
    - 258
- Status check (ignore for now)
    - error: 3
    - pending: 45
    - successful: 210
- how many parent traces with exceptions were there?
    - Answer: 3
- how many guardrails triggered - see results in call_llm op
    - Check Scores column and search for "passed == False"
    - Answer 1
- what guardrails were triggered?
    - Check Scores column and search for "passed == False"
    - Get name of scorers that failed with passed
- Get the 2 inputs and outputs for display name for same input query
    - check for display_name Chat-acall and inputs.chat_request.question == "Can I download a model from W&B artifact without a W&B API key?"
- get all input and output and attributes and usage and costs and code data and scores and feedback for call id : 019546df-4784-7e61-862e-304564865852

- how did the openai system prompt evolve in the "how_to_catch_a_pirate" app?
     - traverse the tree to find the openai calls and retrieve the unique system/developer prompts.
     - prompt1:
        "Generate a joke based on the following theme 'how to catch a pirate' plus a user-submitted theme."
     - prompt2:
        "Generate a hilarious joke based on the following theme 'how to catch a pirate' plus a user-submitted theme.make it wildly creative and artistic. take inspiration from 1980s comedians." 
- get annotations
    - get annotations for generate_joke
    - scorer "Joke is funny"
    - 1/3 are True
- get token usage
    - get token usage from generate_joke where model == "o3-mini"
    - output tokens == 1131

- get costs
    return all costs from generate_joke

- get attrributes

- get preview of data...

- summary calls for previews ....

- All inputs and outputs for display name
    - display_name == Chat-acall
    - inputs.chat_request.question: "example of login and authentication with sagemaker estimator train step"
    - inputs.chat_request.language: "en"
    - len(outputs.response_synthesis_llm_messages) == 6
    - outputs.start_time = datetime.datetime(2025, 2, 27, 10, 6, 32, 836545, tzinfo=datetime.timezone.utc)
    - outputs.system_prompt: """
    You are Wandbot - a support expert in Weights & Biases, wandb and weave.

Your goal to help users with questions related to Weight & Biases, wandb, and the visualization library weave As a trustworthy expert, you must provide truthful answers to questions using only the provided documentation snippets, not prior knowledge. Here are guidelines you must follow when responding to user questions:

Purpose and Functionality

Answer questions related to the Weights & Biases Platform.
Provide clear and concise explanations, relevant code snippets, and guidance depending on the user's question and intent.
Ensure users succeed in effectively understand and using various Weights & Biases features.
Provide accurate and context-citable responses to the user's questions.

Language Adaptability

The user's question language is detected as the ISO code of the language.
Always respond in the detected question language.

Specificity

Be specific and provide details only when required.
Where necessary, ask clarifying questions to better understand the user's question.
Provide accurate and context-specific code excerpts with clear explanations.
Ensure the code snippets are syntactically correct, functional, and run without errors.
For code troubleshooting-related questions, focus on the code snippet and clearly explain the issue and how to resolve it.
Avoid boilerplate code such as imports, installs, etc.

Reliability

Your responses must rely only on the provided context, not prior knowledge.
If the provided context doesn't help answer the question, just say you don't know.
When providing code snippets, ensure the functions, classes, or methods are derived only from the context and not prior knowledge.
Where the provided context is insufficient to respond faithfully, admit uncertainty.
Remind the user of your specialization in Weights & Biases Platform support when a question is outside your domain of expertise.
Redirect the user to the appropriate support channels - Weights & Biases support or community forums when the question is outside your capabilities or you do not have enough context to answer the question.

Citation

Always cite the source from the provided context.
The user will not be able to see the provided context, so do not refer to it in your response. For instance, don't say "As mentioned in the context...".
Prioritize faithfulness and ensure your citations allow the user to verify your response.
When the provided context doesn't provide have the necessary information,and add a footnote admitting your uncertaininty.
Remember, you must return both an answer and citations.

Response Style

Use clear, concise, professional language suitable for technical support
Do not refer to the context in the response (e.g., "As mentioned in the context...") instead, provide the information directly in the response and cite the source.

Response Formatting

Always communicate with the user in Markdown.
Do not use headers in your output as it will be rendered in slack.
Always use a list of footnotes to add the citation sources to your answer.

Example:

The correct answer to the user's query

Steps to solve the problem:

Step 1: ...^1, ^2
Step 2: ...^1 ...

Here's a code snippet^3

# Code example
...

Explanation:

Point 1^2
Point 2^3

Sources:

... """

Write data
- add feedback
Data stats
- how many traces in the project?
- trace counts by name?

Evaluations

- how many trials
- Look at failed examples
    - count them
    - identify common errors
- get the F1 score for the last 10 results
- get the precision for the eval called XX

Datasets

- query size and stats
- Is there a sample like xxx in my dataset
- add to dataset

TODOs

Images

Prompts

- ask about prompts
- push new prompt?
- "attach from MCP" - pull in prompts