Spaces:
				
			
			
	
			
			
		Runtime error
		
	
	
	
			
			
	
	
	
	
		
		
		Runtime error
		
	Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -11,44 +11,45 @@ license: mit | |
| 11 | 
             
            short_description: See if you can predict the masked tokens / next token!
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
            -
             | 
| 15 | 
            -
            This Hugging Face Gradio space tests users on two fundamental NLP tasks:
         | 
| 16 | 
            -
             | 
| 17 | 
            -
            Masked Language Modeling (MLM) - Guess the masked words in a text
         | 
| 18 | 
            -
            Next Token Prediction (NTP) - Predict how a text continues
         | 
| 19 | 
            -
             | 
| 20 | 
            -
            #### Features
         | 
| 21 | 
            -
             | 
| 22 | 
            -
            Switch between MLM and NTP tasks with a simple radio button
         | 
| 23 | 
            -
            Adjust masking/cutting ratio to control difficulty
         | 
| 24 | 
            -
            Sample texts from the cc_news dataset (100 samples)
         | 
| 25 | 
            -
            Track and display user accuracy for both tasks
         | 
| 26 | 
            -
            Detailed feedback on answers
         | 
| 27 | 
            -
             | 
| 28 | 
            -
            #### How to Use
         | 
| 29 | 
            -
            ##### For MLM Task
         | 
| 30 | 
            -
             | 
| 31 | 
            -
            Select "mlm" in the Task Type radio button
         | 
| 32 | 
            -
            Adjust mask ratio as desired (higher = more difficult)
         | 
| 33 | 
            -
            Click "New Sample" to get a text with [MASK] tokens
         | 
| 34 | 
            -
            Enter your guesses for the masked words, separated by spaces or commas
         | 
| 35 | 
            -
            Click "Check Answer" to see your accuracy
         | 
| 36 |  | 
| 37 | 
            -
             | 
| 38 | 
            -
             | 
| 39 | 
            -
            Select "ntp" in the Task Type radio button
         | 
| 40 | 
            -
            Adjust cut ratio as desired (higher = more text is hidden)
         | 
| 41 | 
            -
            Click "New Sample" to get a partial text
         | 
| 42 | 
            -
            Type your prediction of how the text continues
         | 
| 43 | 
            -
            Click "Check Answer" to see your accuracy and the actual continuation
         | 
| 44 | 
            -
             | 
| 45 | 
            -
            #### Statistics
         | 
| 46 | 
            -
             | 
| 47 | 
            -
            The app keeps track of your accuracy for both tasks
         | 
| 48 | 
            -
            Click "Reset Stats" to start fresh
         | 
| 49 | 
            -
             | 
| 50 | 
            -
            #### Technical Details
         | 
| 51 |  | 
| 52 | 
            -
             | 
| 53 | 
            -
             | 
| 54 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 11 | 
             
            short_description: See if you can predict the masked tokens / next token!
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
            +
            # MLM and NTP Testing App
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 15 |  | 
| 16 | 
            +
            This Hugging Face Gradio space tests users on two fundamental NLP tasks:
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 17 |  | 
| 18 | 
            +
            1. **Masked Language Modeling (MLM)** - Guess the masked words in a text
         | 
| 19 | 
            +
            2. **Next Token Prediction (NTP)** - Predict how a text continues
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ## Features
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            - Switch between MLM and NTP tasks with a simple radio button
         | 
| 24 | 
            +
            - Adjust masking/cutting ratio to control difficulty
         | 
| 25 | 
            +
            - Sample texts from the cc_news dataset (100 samples, limited to 2 sentences)
         | 
| 26 | 
            +
            - Track and display user accuracy for both tasks
         | 
| 27 | 
            +
            - Detailed feedback on answers
         | 
| 28 | 
            +
            - Token-by-token prediction for NTP task with immediate feedback
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            ## How to Use
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            ### For MLM Task
         | 
| 33 | 
            +
            1. Select "mlm" in the Task Type radio button
         | 
| 34 | 
            +
            2. Adjust mask ratio as desired (higher = more difficult)
         | 
| 35 | 
            +
            3. Click "New Sample" to get a text with [MASK] tokens
         | 
| 36 | 
            +
            4. Enter your guesses for the masked words, separated by spaces or commas
         | 
| 37 | 
            +
            5. Click "Check Answer" to see your accuracy
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            ### For NTP Task
         | 
| 40 | 
            +
            1. Select "ntp" in the Task Type radio button
         | 
| 41 | 
            +
            2. Adjust cut ratio as desired (higher = more text is hidden)
         | 
| 42 | 
            +
            3. Click "New Sample" to get a partial text
         | 
| 43 | 
            +
            4. Type your prediction for the next token/word
         | 
| 44 | 
            +
            5. Click "Check Answer" to see if you're correct
         | 
| 45 | 
            +
            6. Continue predicting the next tokens one by one
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            ## Statistics
         | 
| 48 | 
            +
            - The app keeps track of your accuracy for both tasks
         | 
| 49 | 
            +
            - Click "Reset Stats" to start fresh
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            ## Technical Details
         | 
| 52 | 
            +
            - Uses HuggingFace's mlfoundations/dclm-baseline-1.0-parquet dataset
         | 
| 53 | 
            +
            - Employs streaming to efficiently sample 100 documents
         | 
| 54 | 
            +
            - Uses BERT tokenizer for consistent tokenization
         | 
| 55 | 
            +
            - Limits samples to two sentences for better user experience
         | 
