Spaces:
Runtime error
Runtime error
Commit
·
85bfd70
1
Parent(s):
049ff35
chore: fix citations and response format
Browse files- rag/rag.py +14 -11
rag/rag.py
CHANGED
|
@@ -29,11 +29,7 @@ Here are the relevant snippets from the Llama 3 405B model research paper:
|
|
| 29 |
{context_str}
|
| 30 |
</snippets>
|
| 31 |
|
| 32 |
-
|
| 33 |
-
{query_str}
|
| 34 |
-
</question>
|
| 35 |
-
|
| 36 |
-
To answer this question:
|
| 37 |
|
| 38 |
1. Carefully read and analyze the provided snippets.
|
| 39 |
2. Identify information that is directly relevant to the user's question.
|
|
@@ -50,11 +46,14 @@ Guidelines for your answer:
|
|
| 50 |
6. Cite the relevant sentences from the snippets and their page numbers to support your answer.
|
| 51 |
7. Answer in MFAQ format (Minimal Facts Answerable Question), providing the most concise and accurate response possible.
|
| 52 |
8. Use Markdown to format your response and include citations to indicate the snippets and the page number used to derive your answer.
|
|
|
|
| 53 |
|
| 54 |
Here's an example of a question and an answer. You must use this as a template to format your response:
|
| 55 |
|
| 56 |
<example>
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
### Answer
|
| 60 |
The main mix of the training data for the Llama 3 405 billion parameter model is as follows:
|
|
@@ -66,16 +65,20 @@ The main mix of the training data for the Llama 3 405 billion parameter model is
|
|
| 66 |
|
| 67 |
Regarding the amount of data used to train the model, the snippets do not provide a specific total volume of data in terms of tokens or bytes. However, they do mention that the model was pre-trained on a large dataset containing knowledge until the end of 2023[^2^]. Additionally, the training process involved pre-training on 2.87 trillion tokens before further adjustments[^3^].
|
| 68 |
|
| 69 |
-
###
|
| 70 |
|
| 71 |
-
[^1^]: "Scaling Laws for Data Mix," page 6.
|
| 72 |
-
[^2^]: "Pre-Training Data," page 4.
|
| 73 |
-
[^3^]: "Initial Pre-Training," page 14.
|
| 74 |
|
| 75 |
</example>
|
| 76 |
|
| 77 |
Remember, your role is to accurately convey the information from the research paper snippets, not to speculate or provide information from other sources.
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
Answer:
|
| 80 |
"""
|
| 81 |
|
|
@@ -113,7 +116,7 @@ class SimpleRAGPipeline(weave.Model):
|
|
| 113 |
nodes,
|
| 114 |
embed_model=self._get_embedding_model(),
|
| 115 |
show_progress=True,
|
| 116 |
-
insert_batch_size=
|
| 117 |
)
|
| 118 |
|
| 119 |
return index
|
|
|
|
| 29 |
{context_str}
|
| 30 |
</snippets>
|
| 31 |
|
| 32 |
+
To answer the question:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
1. Carefully read and analyze the provided snippets.
|
| 35 |
2. Identify information that is directly relevant to the user's question.
|
|
|
|
| 46 |
6. Cite the relevant sentences from the snippets and their page numbers to support your answer.
|
| 47 |
7. Answer in MFAQ format (Minimal Facts Answerable Question), providing the most concise and accurate response possible.
|
| 48 |
8. Use Markdown to format your response and include citations to indicate the snippets and the page number used to derive your answer.
|
| 49 |
+
9. Your answer must only have two headings: 'Answer' and 'Citations'.
|
| 50 |
|
| 51 |
Here's an example of a question and an answer. You must use this as a template to format your response:
|
| 52 |
|
| 53 |
<example>
|
| 54 |
+
<question>
|
| 55 |
+
What was the main mix of the training data ? How much data was used to train the model ?
|
| 56 |
+
</question>
|
| 57 |
|
| 58 |
### Answer
|
| 59 |
The main mix of the training data for the Llama 3 405 billion parameter model is as follows:
|
|
|
|
| 65 |
|
| 66 |
Regarding the amount of data used to train the model, the snippets do not provide a specific total volume of data in terms of tokens or bytes. However, they do mention that the model was pre-trained on a large dataset containing knowledge until the end of 2023[^2^]. Additionally, the training process involved pre-training on 2.87 trillion tokens before further adjustments[^3^].
|
| 67 |
|
| 68 |
+
### Citations
|
| 69 |
|
| 70 |
+
- [^1^]: "Scaling Laws for Data Mix," page 6.
|
| 71 |
+
- [^2^]: "Pre-Training Data," page 4.
|
| 72 |
+
- [^3^]: "Initial Pre-Training," page 14.
|
| 73 |
|
| 74 |
</example>
|
| 75 |
|
| 76 |
Remember, your role is to accurately convey the information from the research paper snippets, not to speculate or provide information from other sources.
|
| 77 |
|
| 78 |
+
<question>
|
| 79 |
+
{query_str}
|
| 80 |
+
</question>
|
| 81 |
+
|
| 82 |
Answer:
|
| 83 |
"""
|
| 84 |
|
|
|
|
| 116 |
nodes,
|
| 117 |
embed_model=self._get_embedding_model(),
|
| 118 |
show_progress=True,
|
| 119 |
+
insert_batch_size=512,
|
| 120 |
)
|
| 121 |
|
| 122 |
return index
|