Commit 
							
							·
						
						be3f459
	
1
								Parent(s):
							
							42814ff
								
wav2vec2-lg-xlsr-en-speech-emotion-recognition
Browse files- .gitignore +1 -0
- README.md +73 -2
- config.json +107 -0
- preprocessor_config.json +9 -0
- pytorch_model.bin +3 -0
- runs/Jul14_08-52-02_ea6be2bf8cd5/1626253006.9531522/events.out.tfevents.1626253006.ea6be2bf8cd5.900.5 +3 -0
- runs/Jul14_08-52-02_ea6be2bf8cd5/events.out.tfevents.1626253006.ea6be2bf8cd5.900.4 +3 -0
- runs/Jul14_08-58-15_ea6be2bf8cd5/1626253103.0537474/events.out.tfevents.1626253103.ea6be2bf8cd5.1946.1 +3 -0
- runs/Jul14_08-58-15_ea6be2bf8cd5/events.out.tfevents.1626253103.ea6be2bf8cd5.1946.0 +3 -0
- training_args.bin +3 -0
    	
        .gitignore
    ADDED
    
    | @@ -0,0 +1 @@ | |
|  | 
|  | |
| 1 | 
            +
            checkpoint-*/
         | 
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,74 @@ | |
| 1 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 2 |  | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: apache-2.0
         | 
| 3 | 
            +
            tags:
         | 
| 4 | 
            +
            - generated_from_trainer
         | 
| 5 | 
            +
            metrics:
         | 
| 6 | 
            +
            - accuracy
         | 
| 7 | 
            +
            model_index:
         | 
| 8 | 
            +
              name: wav2vec2-lg-xlsr-en-speech-emotion-recognition
         | 
| 9 | 
            +
            ---
         | 
| 10 |  | 
| 11 | 
            +
            <!-- This model card has been generated automatically according to the information the Trainer had access to. You
         | 
| 12 | 
            +
            should probably proofread and complete it, then remove this comment. -->
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            # wav2vec2-lg-xlsr-en-speech-emotion-recognition
         | 
| 15 | 
            +
             | 
| 16 | 
            +
            This model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-english](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) on an unkown dataset.
         | 
| 17 | 
            +
            It achieves the following results on the evaluation set:
         | 
| 18 | 
            +
            - Loss: 0.5023
         | 
| 19 | 
            +
            - Accuracy: 0.8223
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ## Model description
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            More information needed
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            ## Intended uses & limitations
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            More information needed
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            ## Training and evaluation data
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            More information needed
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            ## Training procedure
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            ### Training hyperparameters
         | 
| 36 | 
            +
             | 
| 37 | 
            +
            The following hyperparameters were used during training:
         | 
| 38 | 
            +
            - learning_rate: 0.0001
         | 
| 39 | 
            +
            - train_batch_size: 4
         | 
| 40 | 
            +
            - eval_batch_size: 4
         | 
| 41 | 
            +
            - seed: 42
         | 
| 42 | 
            +
            - gradient_accumulation_steps: 2
         | 
| 43 | 
            +
            - total_train_batch_size: 8
         | 
| 44 | 
            +
            - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
         | 
| 45 | 
            +
            - lr_scheduler_type: linear
         | 
| 46 | 
            +
            - num_epochs: 3
         | 
| 47 | 
            +
            - mixed_precision_training: Native AMP
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            ### Training results
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            | Training Loss | Epoch | Step | Validation Loss | Accuracy |
         | 
| 52 | 
            +
            |:-------------:|:-----:|:----:|:---------------:|:--------:|
         | 
| 53 | 
            +
            | 2.0752        | 0.21  | 30   | 2.0505          | 0.1359   |
         | 
| 54 | 
            +
            | 2.0119        | 0.42  | 60   | 1.9340          | 0.2474   |
         | 
| 55 | 
            +
            | 1.8073        | 0.63  | 90   | 1.5169          | 0.3902   |
         | 
| 56 | 
            +
            | 1.5418        | 0.84  | 120  | 1.2373          | 0.5610   |
         | 
| 57 | 
            +
            | 1.1432        | 1.05  | 150  | 1.1579          | 0.5610   |
         | 
| 58 | 
            +
            | 0.9645        | 1.26  | 180  | 0.9610          | 0.6167   |
         | 
| 59 | 
            +
            | 0.8811        | 1.47  | 210  | 0.8063          | 0.7178   |
         | 
| 60 | 
            +
            | 0.8756        | 1.68  | 240  | 0.7379          | 0.7352   |
         | 
| 61 | 
            +
            | 0.8208        | 1.89  | 270  | 0.6839          | 0.7596   |
         | 
| 62 | 
            +
            | 0.7118        | 2.1   | 300  | 0.6664          | 0.7735   |
         | 
| 63 | 
            +
            | 0.4261        | 2.31  | 330  | 0.6058          | 0.8014   |
         | 
| 64 | 
            +
            | 0.4394        | 2.52  | 360  | 0.5754          | 0.8223   |
         | 
| 65 | 
            +
            | 0.4581        | 2.72  | 390  | 0.4719          | 0.8467   |
         | 
| 66 | 
            +
            | 0.3967        | 2.93  | 420  | 0.5023          | 0.8223   |
         | 
| 67 | 
            +
             | 
| 68 | 
            +
             | 
| 69 | 
            +
            ### Framework versions
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            - Transformers 4.8.2
         | 
| 72 | 
            +
            - Pytorch 1.9.0+cu102
         | 
| 73 | 
            +
            - Datasets 1.9.0
         | 
| 74 | 
            +
            - Tokenizers 0.10.3
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,107 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "_name_or_path": "jonatasgrosman/wav2vec2-large-xlsr-53-english",
         | 
| 3 | 
            +
              "activation_dropout": 0.05,
         | 
| 4 | 
            +
              "apply_spec_augment": true,
         | 
| 5 | 
            +
              "architectures": [
         | 
| 6 | 
            +
                "Wav2Vec2ForEmotionRecognition"
         | 
| 7 | 
            +
              ],
         | 
| 8 | 
            +
              "attention_dropout": 0.1,
         | 
| 9 | 
            +
              "bos_token_id": 1,
         | 
| 10 | 
            +
              "codevector_dim": 256,
         | 
| 11 | 
            +
              "contrastive_logits_temperature": 0.1,
         | 
| 12 | 
            +
              "conv_bias": true,
         | 
| 13 | 
            +
              "conv_dim": [
         | 
| 14 | 
            +
                512,
         | 
| 15 | 
            +
                512,
         | 
| 16 | 
            +
                512,
         | 
| 17 | 
            +
                512,
         | 
| 18 | 
            +
                512,
         | 
| 19 | 
            +
                512,
         | 
| 20 | 
            +
                512
         | 
| 21 | 
            +
              ],
         | 
| 22 | 
            +
              "conv_kernel": [
         | 
| 23 | 
            +
                10,
         | 
| 24 | 
            +
                3,
         | 
| 25 | 
            +
                3,
         | 
| 26 | 
            +
                3,
         | 
| 27 | 
            +
                3,
         | 
| 28 | 
            +
                2,
         | 
| 29 | 
            +
                2
         | 
| 30 | 
            +
              ],
         | 
| 31 | 
            +
              "conv_stride": [
         | 
| 32 | 
            +
                5,
         | 
| 33 | 
            +
                2,
         | 
| 34 | 
            +
                2,
         | 
| 35 | 
            +
                2,
         | 
| 36 | 
            +
                2,
         | 
| 37 | 
            +
                2,
         | 
| 38 | 
            +
                2
         | 
| 39 | 
            +
              ],
         | 
| 40 | 
            +
              "ctc_loss_reduction": "mean",
         | 
| 41 | 
            +
              "ctc_zero_infinity": true,
         | 
| 42 | 
            +
              "diversity_loss_weight": 0.1,
         | 
| 43 | 
            +
              "do_stable_layer_norm": true,
         | 
| 44 | 
            +
              "eos_token_id": 2,
         | 
| 45 | 
            +
              "feat_extract_activation": "gelu",
         | 
| 46 | 
            +
              "feat_extract_dropout": 0.0,
         | 
| 47 | 
            +
              "feat_extract_norm": "layer",
         | 
| 48 | 
            +
              "feat_proj_dropout": 0.05,
         | 
| 49 | 
            +
              "feat_quantizer_dropout": 0.0,
         | 
| 50 | 
            +
              "final_dropout": 0.0,
         | 
| 51 | 
            +
              "finetuning_task": "wav2vec2_clf",
         | 
| 52 | 
            +
              "gradient_checkpointing": true,
         | 
| 53 | 
            +
              "hidden_act": "gelu",
         | 
| 54 | 
            +
              "hidden_dropout": 0.05,
         | 
| 55 | 
            +
              "hidden_size": 1024,
         | 
| 56 | 
            +
              "id2label": {
         | 
| 57 | 
            +
                "0": "angry",
         | 
| 58 | 
            +
                "1": "calm",
         | 
| 59 | 
            +
                "2": "disgust",
         | 
| 60 | 
            +
                "3": "fearful",
         | 
| 61 | 
            +
                "4": "happy",
         | 
| 62 | 
            +
                "5": "neutral",
         | 
| 63 | 
            +
                "6": "sad",
         | 
| 64 | 
            +
                "7": "surprised"
         | 
| 65 | 
            +
              },
         | 
| 66 | 
            +
              "initializer_range": 0.02,
         | 
| 67 | 
            +
              "intermediate_size": 4096,
         | 
| 68 | 
            +
              "label2id": {
         | 
| 69 | 
            +
                "angry": 0,
         | 
| 70 | 
            +
                "calm": 1,
         | 
| 71 | 
            +
                "disgust": 2,
         | 
| 72 | 
            +
                "fearful": 3,
         | 
| 73 | 
            +
                "happy": 4,
         | 
| 74 | 
            +
                "neutral": 5,
         | 
| 75 | 
            +
                "sad": 6,
         | 
| 76 | 
            +
                "surprised": 7
         | 
| 77 | 
            +
              },
         | 
| 78 | 
            +
              "layer_norm_eps": 1e-05,
         | 
| 79 | 
            +
              "layerdrop": 0.05,
         | 
| 80 | 
            +
              "mask_channel_length": 10,
         | 
| 81 | 
            +
              "mask_channel_min_space": 1,
         | 
| 82 | 
            +
              "mask_channel_other": 0.0,
         | 
| 83 | 
            +
              "mask_channel_prob": 0.0,
         | 
| 84 | 
            +
              "mask_channel_selection": "static",
         | 
| 85 | 
            +
              "mask_feature_length": 10,
         | 
| 86 | 
            +
              "mask_feature_prob": 0.0,
         | 
| 87 | 
            +
              "mask_time_length": 10,
         | 
| 88 | 
            +
              "mask_time_min_space": 1,
         | 
| 89 | 
            +
              "mask_time_other": 0.0,
         | 
| 90 | 
            +
              "mask_time_prob": 0.05,
         | 
| 91 | 
            +
              "mask_time_selection": "static",
         | 
| 92 | 
            +
              "model_type": "wav2vec2",
         | 
| 93 | 
            +
              "num_attention_heads": 16,
         | 
| 94 | 
            +
              "num_codevector_groups": 2,
         | 
| 95 | 
            +
              "num_codevectors_per_group": 320,
         | 
| 96 | 
            +
              "num_conv_pos_embedding_groups": 16,
         | 
| 97 | 
            +
              "num_conv_pos_embeddings": 128,
         | 
| 98 | 
            +
              "num_feat_extract_layers": 7,
         | 
| 99 | 
            +
              "num_hidden_layers": 24,
         | 
| 100 | 
            +
              "num_negatives": 100,
         | 
| 101 | 
            +
              "pad_token_id": 0,
         | 
| 102 | 
            +
              "pooling_mode": "mean",
         | 
| 103 | 
            +
              "problem_type": "single_label_classification",
         | 
| 104 | 
            +
              "proj_codevector_dim": 256,
         | 
| 105 | 
            +
              "transformers_version": "4.8.2",
         | 
| 106 | 
            +
              "vocab_size": 33
         | 
| 107 | 
            +
            }
         | 
    	
        preprocessor_config.json
    ADDED
    
    | @@ -0,0 +1,9 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "do_normalize": true,
         | 
| 3 | 
            +
              "feature_extractor_type": "Wav2Vec2FeatureExtractor",
         | 
| 4 | 
            +
              "feature_size": 1,
         | 
| 5 | 
            +
              "padding_side": "right",
         | 
| 6 | 
            +
              "padding_value": 0.0,
         | 
| 7 | 
            +
              "return_attention_mask": true,
         | 
| 8 | 
            +
              "sampling_rate": 16000
         | 
| 9 | 
            +
            }
         | 
    	
        pytorch_model.bin
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:51f2f61d0b1e09d7873ad2ef6a287f99b8e11ad1d2ee44d8278940e952270f03
         | 
| 3 | 
            +
            size 1266155629
         | 
    	
        runs/Jul14_08-52-02_ea6be2bf8cd5/1626253006.9531522/events.out.tfevents.1626253006.ea6be2bf8cd5.900.5
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:460b2ae294ccb08fc11579cadb1bbae34cf10153fa8c00211cb0a184ea8446a1
         | 
| 3 | 
            +
            size 4577
         | 
    	
        runs/Jul14_08-52-02_ea6be2bf8cd5/events.out.tfevents.1626253006.ea6be2bf8cd5.900.4
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:af1accc9a00625a83c61c5826569983ef608725fc58682d2fe496cca060973a8
         | 
| 3 | 
            +
            size 4958
         | 
    	
        runs/Jul14_08-58-15_ea6be2bf8cd5/1626253103.0537474/events.out.tfevents.1626253103.ea6be2bf8cd5.1946.1
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:369f6ff7d3170de2645053fb9e35c4769a1cda8bca0672476b6b04e51083b90e
         | 
| 3 | 
            +
            size 4577
         | 
    	
        runs/Jul14_08-58-15_ea6be2bf8cd5/events.out.tfevents.1626253103.ea6be2bf8cd5.1946.0
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:9350f7355b2214dcc1216cc1a321f98740482b2f8ecc628e1b4f09ae2c19a406
         | 
| 3 | 
            +
            size 11932
         | 
    	
        training_args.bin
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:73d0e655c82ce458316cab3dff37a9cc7cf70d03b7f98b5fa338642876e3733f
         | 
| 3 | 
            +
            size 2863
         | 
