Sentence Similarity
sentence-transformers
Safetensors
Japanese
luke
feature-extraction
yano0 commited on
Commit
74eed44
·
verified ·
1 Parent(s): 04ba113

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -15
README.md CHANGED
@@ -65,13 +65,8 @@ Fine-tuning consists of the following steps.
65
 
66
  ### Direct Usage (Sentence Transformers)
67
 
68
- First install the Sentence Transformers library:
69
 
70
- ```bash
71
- pip install -U sentence-transformers
72
- ```
73
-
74
- Then you can load this model and run inference.
75
  ```python
76
  from sentence_transformers import SentenceTransformer
77
  import torch.nn.functional as F
@@ -79,7 +74,8 @@ import torch.nn.functional as F
79
  # Download from the 🤗 Hub
80
  model = SentenceTransformer("pkshatech/GLuCoSE-base-ja-v2")
81
 
82
- # Don't forget to add the prefix "query: " for query-side or "passage: " for passage-side texts.
 
83
  sentences = [
84
  'query: PKSHAはどんな会社ですか?',
85
  'passage: 研究開発したアルゴリズムを、多くの企業のソフトウエア・オペレーションに導入しています。',
@@ -93,19 +89,56 @@ print(embeddings.shape)
93
  # Get the similarity scores for the embeddings
94
  similarities = F.cosine_similarity(embeddings.unsqueeze(0), embeddings.unsqueeze(1), dim=2)
95
  print(similarities)
96
- # tensor([[1.0000, 0.6050, 0.4341, 0.5537],
97
- # [0.6050, 1.0000, 0.5018, 0.6815],
98
- # [0.4341, 0.5018, 1.0000, 0.7534],
99
- # [0.5537, 0.6815, 0.7534, 1.0000]])
100
  ```
101
 
102
- <!--
103
  ### Direct Usage (Transformers)
104
 
105
- <details><summary>Click to see the direct usage in Transformers</summary>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
 
107
- </details>
108
- -->
109
 
110
  <!--
111
  ### Downstream Usage (Sentence Transformers)
 
65
 
66
  ### Direct Usage (Sentence Transformers)
67
 
68
+ You can perform inference using SentenceTransformers with the following code:
69
 
 
 
 
 
 
70
  ```python
71
  from sentence_transformers import SentenceTransformer
72
  import torch.nn.functional as F
 
74
  # Download from the 🤗 Hub
75
  model = SentenceTransformer("pkshatech/GLuCoSE-base-ja-v2")
76
 
77
+ # Each input text should start with "query: " or "passage: ".
78
+ # For tasks other than retrieval, you can simply use the "query: " prefix.
79
  sentences = [
80
  'query: PKSHAはどんな会社ですか?',
81
  'passage: 研究開発したアルゴリズムを、多くの企業のソフトウエア・オペレーションに導入しています。',
 
89
  # Get the similarity scores for the embeddings
90
  similarities = F.cosine_similarity(embeddings.unsqueeze(0), embeddings.unsqueeze(1), dim=2)
91
  print(similarities)
92
+ # [[1.0000, 0.6050, 0.4341, 0.5537],
93
+ # [0.6050, 1.0000, 0.5018, 0.6815],
94
+ # [0.4341, 0.5018, 1.0000, 0.7534],
95
+ # [0.5537, 0.6815, 0.7534, 1.0000]]
96
  ```
97
 
 
98
  ### Direct Usage (Transformers)
99
 
100
+ You can perform inference using Transformers with the following code:
101
+
102
+ ```python
103
+ import torch.nn.functional as F
104
+ from torch import Tensor
105
+ from transformers import AutoTokenizer, AutoModel
106
+
107
+ def mean_pooling(last_hidden_states: Tensor,attention_mask: Tensor) -> Tensor:
108
+ emb = last_hidden_states * attention_mask.unsqueeze(-1)
109
+ emb = emb.sum(dim=1) / attention_mask.sum(dim=1).unsqueeze(-1)
110
+ return emb
111
+
112
+ # Download from the 🤗 Hub
113
+ tokenizer = AutoTokenizer.from_pretrained("pkshatech/GLuCoSE-base-ja-v2")
114
+ model = AutoModel.from_pretrained("pkshatech/GLuCoSE-base-ja-v2")
115
+
116
+ # Each input text should start with "query: " or "passage: ".
117
+ # For tasks other than retrieval, you can simply use the "query: " prefix.
118
+ sentences = [
119
+ 'query: PKSHAはどんな会社ですか?',
120
+ 'passage: 研究開発したアルゴリズムを、多くの企業のソフトウエア・オペレーションに導入しています。',
121
+ 'query: 日本で一番高い山は?',
122
+ 'passage: 富士山(ふじさん)は、標高3776.12 m、日本最高峰(剣ヶ峰)の独立峰で、その優美な風貌は日本国外でも日本の象徴として広く知られている。',
123
+ ]
124
+
125
+ # Tokenize the input texts
126
+ batch_dict = tokenizer(sentences, max_length=512, padding=True, truncation=True, return_tensors='pt')
127
+
128
+ outputs = model(**batch_dict)
129
+ embeddings = mean_pooling(outputs.last_hidden_state, batch_dict['attention_mask'])
130
+ print(embeddings.shape)
131
+ # [4, 768]
132
+
133
+ # Get the similarity scores for the embeddings
134
+ similarities = F.cosine_similarity(embeddings.unsqueeze(0), embeddings.unsqueeze(1), dim=2)
135
+ print(similarities)
136
+ # [[1.0000, 0.6050, 0.4341, 0.5537],
137
+ # [0.6050, 1.0000, 0.5018, 0.6815],
138
+ # [0.4341, 0.5018, 1.0000, 0.7534],
139
+ # [0.5537, 0.6815, 0.7534, 1.0000]]
140
+ ```
141
 
 
 
142
 
143
  <!--
144
  ### Downstream Usage (Sentence Transformers)