refactor: update readme
Browse files- .github/workflows/ci.yml +1 -0
- README.md +29 -0
- mkdocs.yml +9 -0
.github/workflows/ci.yml
CHANGED
|
@@ -63,6 +63,7 @@ jobs:
|
|
| 63 |
- name: Build documentation
|
| 64 |
run: |
|
| 65 |
poetry add mkdocs mkdocs-material --group dev
|
|
|
|
| 66 |
poetry run mkdocs build
|
| 67 |
if: matrix.python-version == '3.12'
|
| 68 |
|
|
|
|
| 63 |
- name: Build documentation
|
| 64 |
run: |
|
| 65 |
poetry add mkdocs mkdocs-material --group dev
|
| 66 |
+
cp README.md docs/index.md
|
| 67 |
poetry run mkdocs build
|
| 68 |
if: matrix.python-version == '3.12'
|
| 69 |
|
README.md
CHANGED
|
@@ -62,6 +62,35 @@ Poetry manages the virtual environment and dependencies automatically, so you do
|
|
| 62 |
- **TWLegalDatasetParser**
|
| 63 |
- **TMLUDatasetParser**
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
## Adding New Dataset Parsers
|
| 66 |
|
| 67 |
To add support for a new dataset, please refer to our detailed guide in [docs/adding_new_parser.md](docs/adding_new_parser.md). The guide includes:
|
|
|
|
| 62 |
- **TWLegalDatasetParser**
|
| 63 |
- **TMLUDatasetParser**
|
| 64 |
|
| 65 |
+
## Quick Start Guide
|
| 66 |
+
|
| 67 |
+
Here's a simple example demonstrating how to use the library:
|
| 68 |
+
|
| 69 |
+
```python
|
| 70 |
+
from llmdataparser import ParserRegistry
|
| 71 |
+
# list all available parsers
|
| 72 |
+
ParserRegistry.list_parsers()
|
| 73 |
+
# get a parser
|
| 74 |
+
parser = ParserRegistry.get_parser("mmlu")
|
| 75 |
+
# load the parser
|
| 76 |
+
parser.load() # optional: task_name, split
|
| 77 |
+
# parse the parser
|
| 78 |
+
parser.parse() # optional: split_names
|
| 79 |
+
|
| 80 |
+
print(parser.task_names)
|
| 81 |
+
print(parser.split_names)
|
| 82 |
+
print(parser.get_dataset_description)
|
| 83 |
+
print(parser.get_huggingface_link)
|
| 84 |
+
print(parser.total_tasks)
|
| 85 |
+
data = parser.get_parsed_data
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
We also provide a Gradio demo for interactive testing:
|
| 89 |
+
|
| 90 |
+
```bash
|
| 91 |
+
python app.py
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
## Adding New Dataset Parsers
|
| 95 |
|
| 96 |
To add support for a new dataset, please refer to our detailed guide in [docs/adding_new_parser.md](docs/adding_new_parser.md). The guide includes:
|
mkdocs.yml
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
site_name: LLMDataParser
|
| 2 |
+
theme:
|
| 3 |
+
name: material
|
| 4 |
+
|
| 5 |
+
nav:
|
| 6 |
+
- Home: index.md
|
| 7 |
+
|
| 8 |
+
plugins:
|
| 9 |
+
- search
|