跳转至

qwen2

mindnlp.transformers.models.qwen2.configuration_qwen2

Qwen2 model configuration

mindnlp.transformers.models.qwen2.configuration_qwen2.Qwen2Config

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [Qwen2Model]. It is used to instantiate a Qwen2 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of Qwen2-7B-beta Qwen/Qwen2-7B-beta.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the Qwen2 model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [Qwen2Model]

TYPE: `int`, *optional*, defaults to 151936 DEFAULT: 151936

hidden_size

Dimension of the hidden representations.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

intermediate_size

Dimension of the MLP representations.

TYPE: `int`, *optional*, defaults to 22016 DEFAULT: 22016

num_hidden_layers

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

num_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

num_key_value_heads

This is the number of key_value heads that should be used to implement Grouped Query Attention. If num_key_value_heads=num_attention_heads, the model will use Multi Head Attention (MHA), if num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be forwarded by meanpooling all the original heads within that group. For more details checkout this paper. If it is not specified, will default to 32.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

hidden_act

The non-linear activation function (function or string) in the decoder.

TYPE: `str` or `function`, *optional*, defaults to `"silu"` DEFAULT: 'silu'

max_position_embeddings

The maximum sequence length that this model might ever be used with.

TYPE: `int`, *optional*, defaults to 32768 DEFAULT: 32768

initializer_range

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

rms_norm_eps

The epsilon used by the rms normalization layers.

TYPE: `float`, *optional*, defaults to 1e-06 DEFAULT: 1e-06

use_cache

Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

tie_word_embeddings

Whether the model's input and output word embeddings should be tied.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

rope_theta

The base period of the RoPE embeddings.

TYPE: `float`, *optional*, defaults to 10000.0 DEFAULT: 10000.0

use_sliding_window

Whether to use sliding window attention.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

sliding_window

Sliding window attention (SWA) window size. If not specified, will default to 4096.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

max_window_layers

The number of layers that use SWA (Sliding Window Attention). The bottom layers use SWA while the top use full attention.

TYPE: `int`, *optional*, defaults to 28 DEFAULT: 28

attention_dropout

The dropout ratio for the attention probabilities.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

Example
>>> from transformers import Qwen2Model, Qwen2Config
...
>>> # Initializing a Qwen2 style configuration
>>> configuration = Qwen2Config()
...
>>> # Initializing a model from the Qwen2-7B style configuration
>>> model = Qwen2Model(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp\transformers\models\qwen2\configuration_qwen2.py
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
class Qwen2Config(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`Qwen2Model`]. It is used to instantiate a
    Qwen2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of
    Qwen2-7B-beta [Qwen/Qwen2-7B-beta](https://hf-mirror.com/Qwen/Qwen2-7B-beta).

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.


    Args:
        vocab_size (`int`, *optional*, defaults to 151936):
            Vocabulary size of the Qwen2 model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`Qwen2Model`]
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimension of the hidden representations.
        intermediate_size (`int`, *optional*, defaults to 22016):
            Dimension of the MLP representations.
        num_hidden_layers (`int`, *optional*, defaults to 32):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads for each attention layer in the Transformer encoder.
        num_key_value_heads (`int`, *optional*, defaults to 32):
            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
            `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be forwarded
            by meanpooling all the original heads within that group. For more details checkout [this
            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `32`.
        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
            The non-linear activation function (function or string) in the decoder.
        max_position_embeddings (`int`, *optional*, defaults to 32768):
            The maximum sequence length that this model might ever be used with.
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        rms_norm_eps (`float`, *optional*, defaults to 1e-06):
            The epsilon used by the rms normalization layers.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether or not the model should return the last key/values attentions (not used by all models). Only
            relevant if `config.is_decoder=True`.
        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
            Whether the model's input and output word embeddings should be tied.
        rope_theta (`float`, *optional*, defaults to 10000.0):
            The base period of the RoPE embeddings.
        use_sliding_window (`bool`, *optional*, defaults to `False`):
            Whether to use sliding window attention.
        sliding_window (`int`, *optional*, defaults to 4096):
            Sliding window attention (SWA) window size. If not specified, will default to `4096`.
        max_window_layers (`int`, *optional*, defaults to 28):
            The number of layers that use SWA (Sliding Window Attention). The bottom layers use SWA while the top
            use full attention.
        attention_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for the attention probabilities.

    Example:
        ```python
        >>> from transformers import Qwen2Model, Qwen2Config
        ...
        >>> # Initializing a Qwen2 style configuration
        >>> configuration = Qwen2Config()
        ...
        >>> # Initializing a model from the Qwen2-7B style configuration
        >>> model = Qwen2Model(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "qwen2"
    keys_to_ignore_at_inference = ["past_key_values"]

    def __init__(
        self,
        vocab_size=151936,
        hidden_size=4096,
        intermediate_size=22016,
        num_hidden_layers=32,
        num_attention_heads=32,
        num_key_value_heads=32,
        hidden_act="silu",
        max_position_embeddings=32768,
        initializer_range=0.02,
        rms_norm_eps=1e-6,
        use_cache=True,
        tie_word_embeddings=False,
        rope_theta=10000.0,
        use_sliding_window=False,
        sliding_window=4096,
        max_window_layers=28,
        attention_dropout=0.0,
        **kwargs,
    ):
        """
        __init__

        Initializes a Qwen2Config object.

        Args:
            self: The instance of the class.
            vocab_size (int): The size of the vocabulary. Default is 151936.
            hidden_size (int): The size of the hidden layers. Default is 4096.
            intermediate_size (int): The size of the intermediate layer. Default is 22016.
            num_hidden_layers (int): The number of hidden layers. Default is 32.
            num_attention_heads (int): The number of attention heads. Default is 32.
            num_key_value_heads (int): The number of key-value attention heads. Default is 32.
            hidden_act (str): The activation function for the hidden layers. Default is 'silu'.
            max_position_embeddings (int): The maximum position embeddings. Default is 32768.
            initializer_range (float): The range for random weight initialization. Default is 0.02.
            rms_norm_eps (float): The epsilon value for RMS normalization. Default is 1e-06.
            use_cache (bool): Indicates whether to use caching. Default is True.
            tie_word_embeddings (bool): Indicates whether to tie word embeddings. Default is False.
            rope_theta (float): The theta value for rope. Default is 10000.0.
            use_sliding_window (bool): Indicates whether to use sliding window. Default is False.
            sliding_window (int): The size of the sliding window. Default is 4096.
            max_window_layers (int): The maximum number of window layers. Default is 28.
            attention_dropout (float): The dropout rate for attention. Default is 0.0.

        Returns:
            None.

        Raises:
            None.
        """
        self.vocab_size = vocab_size
        self.max_position_embeddings = max_position_embeddings
        self.hidden_size = hidden_size
        self.intermediate_size = intermediate_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.use_sliding_window = use_sliding_window
        self.sliding_window = sliding_window
        self.max_window_layers = max_window_layers

        # for backward compatibility
        if num_key_value_heads is None:
            num_key_value_heads = num_attention_heads

        self.num_key_value_heads = num_key_value_heads
        self.hidden_act = hidden_act
        self.initializer_range = initializer_range
        self.rms_norm_eps = rms_norm_eps
        self.use_cache = use_cache
        self.rope_theta = rope_theta
        self.attention_dropout = attention_dropout

        super().__init__(
            tie_word_embeddings=tie_word_embeddings,
            **kwargs,
        )

mindnlp.transformers.models.qwen2.configuration_qwen2.Qwen2Config.__init__(vocab_size=151936, hidden_size=4096, intermediate_size=22016, num_hidden_layers=32, num_attention_heads=32, num_key_value_heads=32, hidden_act='silu', max_position_embeddings=32768, initializer_range=0.02, rms_norm_eps=1e-06, use_cache=True, tie_word_embeddings=False, rope_theta=10000.0, use_sliding_window=False, sliding_window=4096, max_window_layers=28, attention_dropout=0.0, **kwargs)

init

Initializes a Qwen2Config object.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The size of the vocabulary. Default is 151936.

TYPE: int DEFAULT: 151936

hidden_size

The size of the hidden layers. Default is 4096.

TYPE: int DEFAULT: 4096

intermediate_size

The size of the intermediate layer. Default is 22016.

TYPE: int DEFAULT: 22016

num_hidden_layers

The number of hidden layers. Default is 32.

TYPE: int DEFAULT: 32

num_attention_heads

The number of attention heads. Default is 32.

TYPE: int DEFAULT: 32

num_key_value_heads

The number of key-value attention heads. Default is 32.

TYPE: int DEFAULT: 32

hidden_act

The activation function for the hidden layers. Default is 'silu'.

TYPE: str DEFAULT: 'silu'

max_position_embeddings

The maximum position embeddings. Default is 32768.

TYPE: int DEFAULT: 32768

initializer_range

The range for random weight initialization. Default is 0.02.

TYPE: float DEFAULT: 0.02

rms_norm_eps

The epsilon value for RMS normalization. Default is 1e-06.

TYPE: float DEFAULT: 1e-06

use_cache

Indicates whether to use caching. Default is True.

TYPE: bool DEFAULT: True

tie_word_embeddings

Indicates whether to tie word embeddings. Default is False.

TYPE: bool DEFAULT: False

rope_theta

The theta value for rope. Default is 10000.0.

TYPE: float DEFAULT: 10000.0

use_sliding_window

Indicates whether to use sliding window. Default is False.

TYPE: bool DEFAULT: False

sliding_window

The size of the sliding window. Default is 4096.

TYPE: int DEFAULT: 4096

max_window_layers

The maximum number of window layers. Default is 28.

TYPE: int DEFAULT: 28

attention_dropout

The dropout rate for attention. Default is 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\qwen2\configuration_qwen2.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def __init__(
    self,
    vocab_size=151936,
    hidden_size=4096,
    intermediate_size=22016,
    num_hidden_layers=32,
    num_attention_heads=32,
    num_key_value_heads=32,
    hidden_act="silu",
    max_position_embeddings=32768,
    initializer_range=0.02,
    rms_norm_eps=1e-6,
    use_cache=True,
    tie_word_embeddings=False,
    rope_theta=10000.0,
    use_sliding_window=False,
    sliding_window=4096,
    max_window_layers=28,
    attention_dropout=0.0,
    **kwargs,
):
    """
    __init__

    Initializes a Qwen2Config object.

    Args:
        self: The instance of the class.
        vocab_size (int): The size of the vocabulary. Default is 151936.
        hidden_size (int): The size of the hidden layers. Default is 4096.
        intermediate_size (int): The size of the intermediate layer. Default is 22016.
        num_hidden_layers (int): The number of hidden layers. Default is 32.
        num_attention_heads (int): The number of attention heads. Default is 32.
        num_key_value_heads (int): The number of key-value attention heads. Default is 32.
        hidden_act (str): The activation function for the hidden layers. Default is 'silu'.
        max_position_embeddings (int): The maximum position embeddings. Default is 32768.
        initializer_range (float): The range for random weight initialization. Default is 0.02.
        rms_norm_eps (float): The epsilon value for RMS normalization. Default is 1e-06.
        use_cache (bool): Indicates whether to use caching. Default is True.
        tie_word_embeddings (bool): Indicates whether to tie word embeddings. Default is False.
        rope_theta (float): The theta value for rope. Default is 10000.0.
        use_sliding_window (bool): Indicates whether to use sliding window. Default is False.
        sliding_window (int): The size of the sliding window. Default is 4096.
        max_window_layers (int): The maximum number of window layers. Default is 28.
        attention_dropout (float): The dropout rate for attention. Default is 0.0.

    Returns:
        None.

    Raises:
        None.
    """
    self.vocab_size = vocab_size
    self.max_position_embeddings = max_position_embeddings
    self.hidden_size = hidden_size
    self.intermediate_size = intermediate_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.use_sliding_window = use_sliding_window
    self.sliding_window = sliding_window
    self.max_window_layers = max_window_layers

    # for backward compatibility
    if num_key_value_heads is None:
        num_key_value_heads = num_attention_heads

    self.num_key_value_heads = num_key_value_heads
    self.hidden_act = hidden_act
    self.initializer_range = initializer_range
    self.rms_norm_eps = rms_norm_eps
    self.use_cache = use_cache
    self.rope_theta = rope_theta
    self.attention_dropout = attention_dropout

    super().__init__(
        tie_word_embeddings=tie_word_embeddings,
        **kwargs,
    )

mindnlp.transformers.models.qwen2.modeling_qwen2

MindSpore Qwen2 model.

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2Attention

Bases: Module

Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer and "Generating Long Sequences with Sparse Transformers".

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
class Qwen2Attention(nn.Module):
    """
    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
    and "Generating Long Sequences with Sparse Transformers".
    """

    def __init__(self, config: Qwen2Config, layer_idx: Optional[int] = None):
        super().__init__()
        self.config = config
        self.layer_idx = layer_idx
        if layer_idx is None:
            logger.warning_once(
                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
                "when creating this class."
            )

        self.hidden_size = config.hidden_size
        self.num_heads = config.num_attention_heads
        self.head_dim = self.hidden_size // self.num_heads
        self.num_key_value_heads = config.num_key_value_heads
        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
        self.max_position_embeddings = config.max_position_embeddings
        self.rope_theta = config.rope_theta
        self.is_causal = True
        self.attention_dropout = config.attention_dropout

        if (self.head_dim * self.num_heads) != self.hidden_size:
            raise ValueError(
                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
                f" and `num_heads`: {self.num_heads})."
            )
        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)

        self.rotary_emb = Qwen2RotaryEmbedding(
            self.head_dim,
            max_position_embeddings=self.max_position_embeddings,
            base=self.rope_theta,
        )

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Cache] = None,
        output_attentions: bool = False,
        use_cache: bool = False,
        cache_position: Optional[mindspore.Tensor] = None,
    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
        bsz, q_len, _ = hidden_states.shape

        query_states = self.q_proj(hidden_states)
        key_states = self.k_proj(hidden_states)
        value_states = self.v_proj(hidden_states)

        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)

        kv_seq_len = key_states.shape[-2]
        if past_key_value is not None:
            if self.layer_idx is None:
                raise ValueError(
                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
                    "with a layer index."
                )
            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)

        if past_key_value is not None:
            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)

        # repeat k/v heads if n_kv_heads < n_heads
        key_states = repeat_kv(key_states, self.num_key_value_groups)
        value_states = repeat_kv(value_states, self.num_key_value_groups)

        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)

        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
            raise ValueError(
                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
                f" {attn_weights.shape}"
            )

        if attention_mask is not None:  # no matter the length, we just slice it
            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
            attn_weights = attn_weights + causal_mask

        # upcast attention to fp32
        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
        attn_output = ops.matmul(attn_weights, value_states)

        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
            raise ValueError(
                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
                f" {attn_output.shape}"
            )

        attn_output = ops.transpose(attn_output, 1, 2)
        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)

        attn_output = self.o_proj(attn_output)

        if not output_attentions:
            attn_weights = None

        return attn_output, attn_weights, past_key_value

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer

Bases: Module

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
class Qwen2DecoderLayer(nn.Module):
    def __init__(self, config: Qwen2Config, layer_idx: int):
        super().__init__()
        self.hidden_size = config.hidden_size

        if config.sliding_window and config._attn_implementation != "flash_attention_2":
            logger.warning_once(
                f"Sliding Window Attention is enabled but not implemented for `{config._attn_implementation}`; "
                "unexpected results may be encountered."
            )
        self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)

        self.mlp = Qwen2MLP(config)
        self.input_layernorm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
        self.post_attention_layernorm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[mindspore.Tensor]] = None,
        output_attentions: Optional[bool] = False,
        use_cache: Optional[bool] = False,
        cache_position: Optional[mindspore.Tensor] = None,
        **kwargs,
    ) -> Tuple[mindspore.Tensor, Optional[Tuple[mindspore.Tensor, mindspore.Tensor]]]:
        """
        Args:
            hidden_states (`mindspore.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
            attention_mask (`mindspore.Tensor`, *optional*): attention mask of size
                `(batch, sequence_length)` where padding elements are indicated by 0.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
            past_key_value (`Tuple(mindspore.Tensor)`, *optional*): cached past key and value projection states
            cache_position (`mindspore.Tensor` of shape `(sequence_length)`, *optional*):
                Indices depicting the position of the input sequence tokens in the sequence.
            kwargs (`dict`, *optional*):
                Arbitrary kwargs to be ignored, used for FSDP and other methods that injects code
                into the model
        """

        residual = hidden_states

        hidden_states = self.input_layernorm(hidden_states)

        # Self Attention
        hidden_states, self_attn_weights, present_key_value = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
            use_cache=use_cache,
            cache_position=cache_position,
        )
        hidden_states = residual + hidden_states

        # Fully Connected
        residual = hidden_states
        hidden_states = self.post_attention_layernorm(hidden_states)
        hidden_states = self.mlp(hidden_states)
        hidden_states = residual + hidden_states

        outputs = (hidden_states,)

        if output_attentions:
            outputs += (self_attn_weights,)

        if use_cache:
            outputs += (present_key_value,)

        return outputs

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer.forward(hidden_states, attention_mask=None, position_ids=None, past_key_value=None, output_attentions=False, use_cache=False, cache_position=None, **kwargs)

PARAMETER DESCRIPTION
hidden_states

input to the layer of shape (batch, seq_len, embed_dim)

TYPE: `mindspore.Tensor`

attention_mask

attention mask of size (batch, sequence_length) where padding elements are indicated by 0.

TYPE: `mindspore.Tensor`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: False

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: False

past_key_value

cached past key and value projection states

TYPE: `Tuple(mindspore.Tensor)`, *optional* DEFAULT: None

cache_position

Indices depicting the position of the input sequence tokens in the sequence.

TYPE: `mindspore.Tensor` of shape `(sequence_length)`, *optional* DEFAULT: None

kwargs

Arbitrary kwargs to be ignored, used for FSDP and other methods that injects code into the model

TYPE: `dict`, *optional* DEFAULT: {}

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[mindspore.Tensor]] = None,
    output_attentions: Optional[bool] = False,
    use_cache: Optional[bool] = False,
    cache_position: Optional[mindspore.Tensor] = None,
    **kwargs,
) -> Tuple[mindspore.Tensor, Optional[Tuple[mindspore.Tensor, mindspore.Tensor]]]:
    """
    Args:
        hidden_states (`mindspore.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
        attention_mask (`mindspore.Tensor`, *optional*): attention mask of size
            `(batch, sequence_length)` where padding elements are indicated by 0.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
        past_key_value (`Tuple(mindspore.Tensor)`, *optional*): cached past key and value projection states
        cache_position (`mindspore.Tensor` of shape `(sequence_length)`, *optional*):
            Indices depicting the position of the input sequence tokens in the sequence.
        kwargs (`dict`, *optional*):
            Arbitrary kwargs to be ignored, used for FSDP and other methods that injects code
            into the model
    """

    residual = hidden_states

    hidden_states = self.input_layernorm(hidden_states)

    # Self Attention
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
        hidden_states=hidden_states,
        attention_mask=attention_mask,
        position_ids=position_ids,
        past_key_value=past_key_value,
        output_attentions=output_attentions,
        use_cache=use_cache,
        cache_position=cache_position,
    )
    hidden_states = residual + hidden_states

    # Fully Connected
    residual = hidden_states
    hidden_states = self.post_attention_layernorm(hidden_states)
    hidden_states = self.mlp(hidden_states)
    hidden_states = residual + hidden_states

    outputs = (hidden_states,)

    if output_attentions:
        outputs += (self_attn_weights,)

    if use_cache:
        outputs += (present_key_value,)

    return outputs

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2ForCausalLM

Bases: Qwen2PreTrainedModel

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
class Qwen2ForCausalLM(Qwen2PreTrainedModel):
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config):
        super().__init__(config)
        self.model = Qwen2Model(config)
        self.vocab_size = config.vocab_size
        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        return self.model.embed_tokens

    def set_input_embeddings(self, value):
        self.model.embed_tokens = value

    def get_output_embeddings(self):
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        self.lm_head = new_embeddings

    def set_decoder(self, decoder):
        self.model = decoder

    def get_decoder(self):
        return self.model

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        cache_position: Optional[mindspore.Tensor] = None,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
                config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
                (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

        Returns:

        Example:

        ```python
        >>> from transformers import AutoTokenizer, Qwen2ForCausalLM

        >>> model = Qwen2ForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
        >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

        >>> prompt = "Hey, are you conscious? Can you talk to me?"
        >>> inputs = tokenizer(prompt, return_tensors="ms")

        >>> # Generate
        >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
        >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
        "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
        ```"""

        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
        outputs = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
            cache_position=cache_position,
        )

        hidden_states = outputs[0]
        logits = self.lm_head(hidden_states)
        logits = logits.float()

        loss = None
        if labels is not None:
            # Shift so that tokens < n predict n
            shift_logits = logits[..., :-1, :]
            shift_labels = labels[..., 1:]
            # Flatten the tokens
            loss_fct = CrossEntropyLoss()
            shift_logits = shift_logits.view(-1, self.config.vocab_size)
            shift_labels = shift_labels.view(-1)
            # Enable model parallelism
            loss = loss_fct(shift_logits, shift_labels)

        if not return_dict:
            output = (logits,) + outputs[1:]
            return (loss,) + output if loss is not None else output

        return CausalLMOutputWithPast(
            loss=loss,
            logits=logits,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

    # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
    def prepare_inputs_for_generation(
        self,
        input_ids,
        past_key_values=None,
        attention_mask=None,
        inputs_embeds=None,
        cache_position=None,
        position_ids=None,
        use_cache=True,
        **kwargs,
    ):
        # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
        # Exception 1: when passing input_embeds, input_ids may be missing entries
        # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
        if past_key_values is not None:
            if inputs_embeds is not None:  # Exception 1
                if 0 not in input_ids.shape:
                    input_ids = input_ids[:, -cache_position.shape[0] :]
            elif input_ids.shape[1] != cache_position.shape[0]:  # Default case (the "else", a no op, is Exception 2)
                input_ids = ops.index_select(input_ids, -1, cache_position)

        if attention_mask is not None and position_ids is None:
            # create position_ids on the fly for batch generation
            position_ids = attention_mask.int().cumsum(-1) - 1
            position_ids = position_ids.masked_fill(attention_mask == 0, 1)
            if past_key_values:
                position_ids = position_ids[:, -input_ids.shape[1] :]

        # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
        if inputs_embeds is not None and cache_position[0] == 0:
            model_inputs = {"inputs_embeds": inputs_embeds}
        else:
            model_inputs = {"input_ids": input_ids}

        if isinstance(past_key_values, StaticCache) and attention_mask.ndim == 2:
            if inputs_embeds is not None:
                batch_size, sequence_length = inputs_embeds.shape
            else:
                batch_size, sequence_length = input_ids.shape

            dtype = self.lm_head.weight.dtype
            min_dtype = float(ops.finfo(dtype).min)

            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
                attention_mask,
                sequence_length=sequence_length,
                target_length=past_key_values.get_max_length(),
                dtype=dtype,
                min_dtype=min_dtype,
                cache_position=cache_position,
                batch_size=batch_size,
            )

        model_inputs.update(
            {
                "position_ids": position_ids,
                "cache_position": cache_position,
                "past_key_values": past_key_values,
                "use_cache": use_cache,
                "attention_mask": attention_mask,
            }
        )
        return model_inputs

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2ForCausalLM.forward(input_ids=None, attention_mask=None, position_ids=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None, cache_position=None)

PARAMETER DESCRIPTION
labels

Labels for computing the masked language modeling loss. Indices should either be in [0, ..., config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Example:

>>> from transformers import AutoTokenizer, Qwen2ForCausalLM

>>> model = Qwen2ForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
>>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

>>> prompt = "Hey, are you conscious? Can you talk to me?"
>>> inputs = tokenizer(prompt, return_tensors="ms")

>>> # Generate
>>> generate_ids = model.generate(inputs.input_ids, max_length=30)
>>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
"Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
    cache_position: Optional[mindspore.Tensor] = None,
) -> Union[Tuple, CausalLMOutputWithPast]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
            config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
            (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

    Returns:

    Example:

    ```python
    >>> from transformers import AutoTokenizer, Qwen2ForCausalLM

    >>> model = Qwen2ForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
    >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

    >>> prompt = "Hey, are you conscious? Can you talk to me?"
    >>> inputs = tokenizer(prompt, return_tensors="ms")

    >>> # Generate
    >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
    >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
    ```"""

    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
    outputs = self.model(
        input_ids=input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
        cache_position=cache_position,
    )

    hidden_states = outputs[0]
    logits = self.lm_head(hidden_states)
    logits = logits.float()

    loss = None
    if labels is not None:
        # Shift so that tokens < n predict n
        shift_logits = logits[..., :-1, :]
        shift_labels = labels[..., 1:]
        # Flatten the tokens
        loss_fct = CrossEntropyLoss()
        shift_logits = shift_logits.view(-1, self.config.vocab_size)
        shift_labels = shift_labels.view(-1)
        # Enable model parallelism
        loss = loss_fct(shift_logits, shift_labels)

    if not return_dict:
        output = (logits,) + outputs[1:]
        return (loss,) + output if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=logits,
        past_key_values=outputs.past_key_values,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2ForSequenceClassification

Bases: Qwen2PreTrainedModel

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
class Qwen2ForSequenceClassification(Qwen2PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.model = Qwen2Model(config)
        self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        return self.model.embed_tokens

    def set_input_embeddings(self, value):
        self.model.embed_tokens = value

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, SequenceClassifierOutputWithPast]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        hidden_states = transformer_outputs[0]
        logits = self.score(hidden_states)

        if input_ids is not None:
            batch_size = input_ids.shape[0]
        else:
            batch_size = inputs_embeds.shape[0]

        if self.config.pad_token_id is None and batch_size != 1:
            raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
        if self.config.pad_token_id is None:
            sequence_lengths = -1
        else:
            if input_ids is not None:
                # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
                sequence_lengths = ops.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
                sequence_lengths = sequence_lengths % input_ids.shape[-1]
            else:
                sequence_lengths = -1

        if ON_ORANGE_PI:
            if isinstance(sequence_lengths, mindspore.Tensor):
                sequence_lengths = sequence_lengths.to(mindspore.int32)
            pooled_logits = ops.getitem(logits, (ops.arange(batch_size), sequence_lengths))
        else:
            pooled_logits = logits[ops.arange(batch_size), sequence_lengths]

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(pooled_logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(pooled_logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(pooled_logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(pooled_logits, labels)
        if not return_dict:
            output = (pooled_logits,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutputWithPast(
            loss=loss,
            logits=pooled_logits,
            past_key_values=transformer_outputs.past_key_values,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2ForSequenceClassification.forward(input_ids=None, attention_mask=None, position_ids=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

labels (mindspore.Tensor of shape (batch_size,), optional): Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, SequenceClassifierOutputWithPast]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
        config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
        `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.model(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    hidden_states = transformer_outputs[0]
    logits = self.score(hidden_states)

    if input_ids is not None:
        batch_size = input_ids.shape[0]
    else:
        batch_size = inputs_embeds.shape[0]

    if self.config.pad_token_id is None and batch_size != 1:
        raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
    if self.config.pad_token_id is None:
        sequence_lengths = -1
    else:
        if input_ids is not None:
            # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
            sequence_lengths = ops.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
            sequence_lengths = sequence_lengths % input_ids.shape[-1]
        else:
            sequence_lengths = -1

    if ON_ORANGE_PI:
        if isinstance(sequence_lengths, mindspore.Tensor):
            sequence_lengths = sequence_lengths.to(mindspore.int32)
        pooled_logits = ops.getitem(logits, (ops.arange(batch_size), sequence_lengths))
    else:
        pooled_logits = logits[ops.arange(batch_size), sequence_lengths]

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            loss_fct = MSELoss()
            if self.num_labels == 1:
                loss = loss_fct(pooled_logits.squeeze(), labels.squeeze())
            else:
                loss = loss_fct(pooled_logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(pooled_logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss_fct = BCEWithLogitsLoss()
            loss = loss_fct(pooled_logits, labels)
    if not return_dict:
        output = (pooled_logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutputWithPast(
        loss=loss,
        logits=pooled_logits,
        past_key_values=transformer_outputs.past_key_values,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2ForTokenClassification

Bases: Qwen2PreTrainedModel

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
class Qwen2ForTokenClassification(Qwen2PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.model = Qwen2Model(config)
        if getattr(config, "classifier_dropout", None) is not None:
            classifier_dropout = config.classifier_dropout
        elif getattr(config, "hidden_dropout", None) is not None:
            classifier_dropout = config.hidden_dropout
        else:
            classifier_dropout = 0.1
        self.dropout = nn.Dropout(classifier_dropout)
        self.score = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        return self.model.embed_tokens

    def set_input_embeddings(self, value):
        self.model.embed_tokens = value

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, TokenClassifierOutput]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        sequence_output = outputs[0]
        sequence_output = self.dropout(sequence_output)
        logits = self.score(sequence_output)

        loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2ForTokenClassification.forward(input_ids=None, attention_mask=None, position_ids=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

labels (mindspore.Tensor of shape (batch_size,), optional): Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, TokenClassifierOutput]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
        config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
        `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.model(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    sequence_output = outputs[0]
    sequence_output = self.dropout(sequence_output)
    logits = self.score(sequence_output)

    loss = None
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2Model

Bases: Qwen2PreTrainedModel

Transformer decoder consisting of config.num_hidden_layers layers. Each layer is a [Qwen2DecoderLayer]

PARAMETER DESCRIPTION
config

Qwen2Config

TYPE: Qwen2Config

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
class Qwen2Model(Qwen2PreTrainedModel):
    """
    Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`Qwen2DecoderLayer`]

    Args:
        config: Qwen2Config
    """

    def __init__(self, config: Qwen2Config):
        super().__init__(config)
        self.padding_idx = config.pad_token_id
        self.vocab_size = config.vocab_size

        self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
        self.layers = nn.ModuleList(
            [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
        )
        self._attn_implementation = config._attn_implementation
        self.norm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

        self.gradient_checkpointing = False
        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        return self.embed_tokens

    def set_input_embeddings(self, value):
        self.embed_tokens = value

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        cache_position: Optional[mindspore.Tensor] = None,
    ) -> Union[Tuple, BaseModelOutputWithPast]:
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        use_cache = use_cache if use_cache is not None else self.config.use_cache

        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if (input_ids is None) ^ (inputs_embeds is not None):
            raise ValueError(
                "You cannot specify both input_ids and inputs_embeds at the same time, and must specify either one"
            )

        if self.gradient_checkpointing and self.training:
            if use_cache:
                logger.warning_once(
                    "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
                )
                use_cache = False

        use_legacy_cache = False
        if use_cache and not isinstance(past_key_values, Cache) and not self.training:
            use_legacy_cache = True
            past_key_values = DynamicCache.from_legacy_cache(past_key_values)
            logger.warning_once(
                "We detected that you are passing `past_key_values` as a tuple and this is deprecated.43. "
                "Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)"
            )

        if inputs_embeds is None:
            inputs_embeds = self.embed_tokens(input_ids)

        if cache_position is None:
            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
            cache_position = ops.arange(
                past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1]
            )
        if position_ids is None:
            position_ids = cache_position.unsqueeze(0)

        causal_mask = self._update_causal_mask(
            attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
        )

        hidden_states = inputs_embeds

        # decoder layers
        all_hidden_states = () if output_hidden_states else None
        all_self_attns = () if output_attentions else None
        next_decoder_cache = None

        for decoder_layer in self.layers:
            if output_hidden_states:
                all_hidden_states += (hidden_states,)

            if self.gradient_checkpointing and self.training:
                layer_outputs = self._gradient_checkpointing_func(
                    decoder_layer.__call__,
                    hidden_states,
                    causal_mask,
                    position_ids,
                    past_key_values,
                    output_attentions,
                    use_cache,
                    cache_position,
                )
            else:
                layer_outputs = decoder_layer(
                    hidden_states,
                    attention_mask=causal_mask,
                    position_ids=position_ids,
                    past_key_value=past_key_values,
                    output_attentions=output_attentions,
                    use_cache=use_cache,
                    cache_position=cache_position,
                )

            hidden_states = layer_outputs[0]

            if use_cache:
                next_decoder_cache = layer_outputs[2 if output_attentions else 1]

            if output_attentions:
                all_self_attns += (layer_outputs[1],)

        hidden_states = self.norm(hidden_states)

        # add hidden states from the last decoder layer
        if output_hidden_states:
            all_hidden_states += (hidden_states,)

        next_cache = None
        if use_cache:
            next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache

        if not return_dict:
            return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
        return BaseModelOutputWithPast(
            last_hidden_state=hidden_states,
            past_key_values=next_cache,
            hidden_states=all_hidden_states,
            attentions=all_self_attns,
        )

    # Copied from transformers.models.llama.modeling_llama.LlamaModel._update_causal_mask
    def _update_causal_mask(
        self,
        attention_mask: mindspore.Tensor,
        input_tensor: mindspore.Tensor,
        cache_position: mindspore.Tensor,
        past_key_values: Cache,
        output_attentions: bool,
    ):

        if self.config._attn_implementation == "flash_attention_2":
            if attention_mask is not None and 0.0 in attention_mask:
                return attention_mask
            return None

        # For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
        # order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
        # to infer the attention mask.
        past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
        using_static_cache = isinstance(past_key_values, StaticCache)

        # When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
        if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
            if AttentionMaskConverter._ignore_causal_mask_sdpa(
                attention_mask,
                inputs_embeds=input_tensor,
                past_key_values_length=past_seen_tokens,
                is_training=self.training,
            ):
                return None

        dtype = input_tensor.dtype
        min_dtype = float(ops.finfo(dtype).min)
        sequence_length = input_tensor.shape[1]
        if using_static_cache:
            target_length = past_key_values.get_max_length()
        else:
            target_length = (
                attention_mask.shape[-1]
                if isinstance(attention_mask, mindspore.Tensor)
                else past_seen_tokens + sequence_length + 1
            )

        # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
            attention_mask,
            sequence_length=sequence_length,
            target_length=target_length,
            dtype=dtype,
            min_dtype=min_dtype,
            cache_position=cache_position,
            batch_size=input_tensor.shape[0],
        )

        return causal_mask

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2RMSNorm

Bases: Module

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
class Qwen2RMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        Qwen2RMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(ops.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        if not self.training and use_pyboost() and not ON_ORANGE_PI:
            return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
        input_dtype = hidden_states.dtype
        hidden_states = hidden_states.to(mindspore.float32)
        variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
        hidden_states = hidden_states * ops.rsqrt(variance + self.variance_epsilon)
        return self.weight * hidden_states.to(input_dtype)

    def extra_repr(self):
        return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"

mindnlp.transformers.models.qwen2.modeling_qwen2.Qwen2RMSNorm.__init__(hidden_size, eps=1e-06)

Qwen2RMSNorm is equivalent to T5LayerNorm

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
113
114
115
116
117
118
119
def __init__(self, hidden_size, eps=1e-6):
    """
    Qwen2RMSNorm is equivalent to T5LayerNorm
    """
    super().__init__()
    self.weight = nn.Parameter(ops.ones(hidden_size))
    self.variance_epsilon = eps

mindnlp.transformers.models.qwen2.modeling_qwen2.apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1)

Applies Rotary Position Embedding to the query and key tensors.

PARAMETER DESCRIPTION
q

The query tensor.

TYPE: `mindspore.Tensor`

k

The key tensor.

TYPE: `mindspore.Tensor`

cos

The cosine part of the rotary embedding.

TYPE: `mindspore.Tensor`

sin

The sine part of the rotary embedding.

TYPE: `mindspore.Tensor`

position_ids

The position indices of the tokens corresponding to the query and key tensors. For example, this can be used to pass offsetted position ids when working with a KV-cache.

TYPE: `mindspore.Tensor`

unsqueeze_dim

The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
    """Applies Rotary Position Embedding to the query and key tensors.

    Args:
        q (`mindspore.Tensor`): The query tensor.
        k (`mindspore.Tensor`): The key tensor.
        cos (`mindspore.Tensor`): The cosine part of the rotary embedding.
        sin (`mindspore.Tensor`): The sine part of the rotary embedding.
        position_ids (`mindspore.Tensor`):
            The position indices of the tokens corresponding to the query and key tensors. For example, this can be
            used to pass offsetted position ids when working with a KV-cache.
        unsqueeze_dim (`int`, *optional*, defaults to 1):
            The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
            sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
            that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
            k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
            cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
            the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
    Returns:
        `tuple(mindspore.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
    """
    position_ids = (position_ids + cos.shape[0]) % cos.shape[0]
    cos = F.embedding(position_ids, cos).unsqueeze(unsqueeze_dim)
    sin = F.embedding(position_ids, sin).unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

mindnlp.transformers.models.qwen2.modeling_qwen2.repeat_kv(hidden_states, n_rep)

This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
224
225
226
227
228
229
230
231
232
233
def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
    """
    This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
    num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
    """
    batch, num_key_value_heads, slen, head_dim = hidden_states.shape
    if n_rep == 1:
        return hidden_states
    hidden_states = hidden_states[:, :, None, :, :].broadcast_to((batch, num_key_value_heads, n_rep, slen, head_dim))
    return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)

mindnlp.transformers.models.qwen2.modeling_qwen2.rotate_half(x)

Rotates half the hidden dims of the input.

Source code in mindnlp\transformers\models\qwen2\modeling_qwen2.py
171
172
173
174
175
176
def rotate_half(x):
    """Rotates half the hidden dims of the input."""
    # x1 = x[..., : x.shape[-1] // 2]
    # x2 = x[..., x.shape[-1] // 2 :]
    x1, x2 = ops.split(x, x.shape[-1] // 2, dim=-1)
    return ops.cat((-x2, x1), dim=-1)

mindnlp.transformers.models.qwen2.tokenization_qwen2

Tokenization classes for Qwen2.

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer

Bases: PreTrainedTokenizer

Construct a Qwen2 tokenizer. Based on byte-level Byte-Pair-Encoding.

Same with GPT2Tokenizer, this tokenizer has been trained to treat spaces like parts of the tokens so a word will be encoded differently whether it is at the beginning of the sentence (without space) or not:

Example
>>> from transformers import Qwen2Tokenizer
...
>>> tokenizer = Qwen2Tokenizer.from_pretrained("Qwen/Qwen-tokenizer")
>>> tokenizer("Hello world")["input_ids"]
[9707, 1879]
>>> tokenizer(" Hello world")["input_ids"]
[21927, 1879]

This is expected.

You should not use GPT2Tokenizer instead, because of the different pretokenization rules.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str`

merges_file

Path to the merges file.

TYPE: `str`

errors

Paradigm to follow when decoding bytes to UTF-8. See bytes.decode for more information.

TYPE: `str`, *optional*, defaults to `"replace"` DEFAULT: 'replace'

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<|endoftext|>"` DEFAULT: '<|endoftext|>'

bos_token

The beginning of sequence token. Not applicable for this tokenizer.

TYPE: `str`, *optional* DEFAULT: None

eos_token

The end of sequence token.

TYPE: `str`, *optional*, defaults to `"<|endoftext|>"` DEFAULT: '<|endoftext|>'

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<|endoftext|>"` DEFAULT: '<|endoftext|>'

clean_up_tokenization_spaces

Whether or not the model should cleanup the spaces that were added when splitting the input text during the tokenization process. Not applicable to this tokenizer, since tokenization does not add spaces.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

split_special_tokens

Whether or not the special tokens should be split during the tokenization process. The default behavior is to not split special tokens. This means that if <|endoftext|> is the eos_token, then tokenizer.tokenize("<|endoftext|>") = ['<|endoftext|>]. Otherwise, if split_special_tokens=True, then tokenizer.tokenize("<|endoftext|>") will be give ['<', '|', 'endo', 'ft', 'ext', '|', '>']. This argument is only supported for slow tokenizers for the moment.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
class Qwen2Tokenizer(PreTrainedTokenizer):
    """
    Construct a Qwen2 tokenizer. Based on byte-level Byte-Pair-Encoding.

    Same with GPT2Tokenizer, this tokenizer has been trained to treat spaces like parts of the tokens so a word will
    be encoded differently whether it is at the beginning of the sentence (without space) or not:

    Example:
        ```python
        >>> from transformers import Qwen2Tokenizer
        ...
        >>> tokenizer = Qwen2Tokenizer.from_pretrained("Qwen/Qwen-tokenizer")
        >>> tokenizer("Hello world")["input_ids"]
        [9707, 1879]
        >>> tokenizer(" Hello world")["input_ids"]
        [21927, 1879]
        ```
    This is expected.

    You should not use GPT2Tokenizer instead, because of the different pretokenization rules.

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        merges_file (`str`):
            Path to the merges file.
        errors (`str`, *optional*, defaults to `"replace"`):
            Paradigm to follow when decoding bytes to UTF-8. See
            [bytes.decode](https://docs.python.org/3/library/stdtypes.html#bytes.decode) for more information.
        unk_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        bos_token (`str`, *optional*):
            The beginning of sequence token. Not applicable for this tokenizer.
        eos_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
            The end of sequence token.
        pad_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
            The token used for padding, for example when batching sequences of different lengths.
        clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`):
            Whether or not the model should cleanup the spaces that were added when splitting the input text during the
            tokenization process. Not applicable to this tokenizer, since tokenization does not add spaces.
        split_special_tokens (`bool`, *optional*, defaults to `False`):
            Whether or not the special tokens should be split during the tokenization process. The default behavior is
            to not split special tokens. This means that if `<|endoftext|>` is the `eos_token`, then `tokenizer.tokenize("<|endoftext|>") =
            ['<|endoftext|>`]. Otherwise, if `split_special_tokens=True`, then `tokenizer.tokenize("<|endoftext|>")` will be give `['<',
            '|', 'endo', 'ft', 'ext', '|', '>']`. This argument is only supported for `slow` tokenizers for the moment.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = MAX_MODEL_INPUT_SIZES
    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        vocab_file,
        merges_file,
        errors="replace",
        unk_token="<|endoftext|>",
        bos_token=None,
        eos_token="<|endoftext|>",
        pad_token="<|endoftext|>",
        clean_up_tokenization_spaces=False,
        split_special_tokens=False,
        **kwargs,
    ):
        """
        Initializes an instance of the Qwen2Tokenizer class.

        Args:
            self: The instance of the class.
            vocab_file (str): The path to the vocabulary file.
            merges_file (str): The path to the merges file.
            errors (str, optional): Specifies how to handle errors during tokenization. Defaults to 'replace'.
            unk_token (str, optional): The unknown token. Defaults to 'endoftext'.
            bos_token (str or None, optional): The beginning-of-sequence token. Defaults to None.
            eos_token (str, optional): The end-of-sequence token. Defaults to 'endoftext'.
            pad_token (str, optional): The padding token. Defaults to 'endoftext'.
            clean_up_tokenization_spaces (bool, optional): Specifies whether to clean up tokenization spaces.
                Defaults to False.
            split_special_tokens (bool, optional): Specifies whether to split special tokens. Defaults to False.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            FileNotFoundError: If the vocab_file or merges_file does not exist.
            UnicodeDecodeError: If there is an error decoding the vocab_file or merges_file.
            ValueError: If the vocab_file or merges_file is empty.
        """
        # Qwen vocab does not contain control tokens; added tokens need to be special
        bos_token = (
            AddedToken(bos_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(bos_token, str)
            else bos_token
        )
        eos_token = (
            AddedToken(eos_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(eos_token, str)
            else eos_token
        )
        unk_token = (
            AddedToken(unk_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(unk_token, str)
            else unk_token
        )
        pad_token = (
            AddedToken(pad_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(pad_token, str)
            else pad_token
        )

        with open(vocab_file, encoding="utf-8") as vocab_handle:
            self.encoder = json.load(vocab_handle)
        self.decoder = {v: k for k, v in self.encoder.items()}
        self.errors = errors  # how to handle errors in decoding
        self.byte_encoder = bytes_to_unicode()
        self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
        bpe_merges = []
        with open(merges_file, encoding="utf-8") as merges_handle:
            for line in merges_handle:
                line = line.strip()
                if not line or line.startswith("#"):
                    continue
                bpe_merges.append(tuple(line.split()))
        self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))
        # NOTE: the cache can grow without bound and will get really large for long running processes
        # (esp. for texts of language that do not use space between word, e.g. Chinese); technically
        # not a memory leak but appears as one.
        # GPT2Tokenizer has the same problem, so let's be consistent.
        self.cache = {}

        self.pat = re.compile(PRETOKENIZE_REGEX)

        if kwargs.get("add_prefix_space", False):
            logger.warning_once(
                f"{self.__class__.__name} does not support `add_prefix_space`, setting it to True has no effect."
            )

        super().__init__(
            errors=errors,
            bos_token=bos_token,
            eos_token=eos_token,
            pad_token=pad_token,
            unk_token=unk_token,
            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
            split_special_tokens=split_special_tokens,
            **kwargs,
        )

    @property
    def vocab_size(self) -> int:
        """
        Get the size of the vocabulary.

        This method returns the number of unique tokens in the tokenizer's encoder.

        Args:
            self (Qwen2Tokenizer): An instance of the Qwen2Tokenizer class.

        Returns:
            int: The size of the vocabulary.

        Raises:
            None.
        """
        return len(self.encoder)

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.get_vocab
    def get_vocab(self):
        """
        Returns the vocabulary of the tokenizer.

        Args:
            self (Qwen2Tokenizer): The instance of the Qwen2Tokenizer class.

        Returns:
            dict: A dictionary representing the vocabulary of the tokenizer.
                The keys are the tokens, and the values are their corresponding indices in the vocabulary.

        Raises:
            None.

        Note:
            The vocabulary is obtained by merging the `encoder` and `added_tokens_encoder` dictionaries of the
            tokenizer instance.
        """
        return dict(self.encoder, **self.added_tokens_encoder)

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.bpe
    def bpe(self, token):
        """
        Perform Byte Pair Encoding (BPE) on a given token.

        Args:
            self (Qwen2Tokenizer): An instance of the Qwen2Tokenizer class.
            token (str): The input token to be encoded using BPE.

        Returns:
            str: The BPE-encoded version of the input token.

        Raises:
            None.

        Note:
            This method applies Byte Pair Encoding (BPE) algorithm to a given token. BPE is a subword tokenization technique
            commonly used in natural language processing tasks. It splits a token into subword units based on the most
            frequently occurring pairs of characters.

            The BPE algorithm starts by converting the token into a tuple of individual characters. It then identifies the
            most frequent character pairs using the `get_pairs` function. If no pairs are found, the original token is
            returned as it cannot be further split.

            The algorithm iteratively replaces the most frequent character pair with a new subword unit. This process is
            repeated until no more frequent character pairs are found or the token is reduced to a single character.

            Finally, the BPE-encoded token is returned as a string with subword units separated by spaces.

            To improve performance, the method utilizes a cache to store previously processed tokens. If a token is found in
            the cache, its encoded version is returned directly without recomputing.

        Example:
            ```python
            >>> tokenizer = Qwen2Tokenizer()
            >>> encoded_token = tokenizer.bpe('hello')
            >>> print(encoded_token)
            >>> # Output: 'he ll o'
            ...
            >>> encoded_token = tokenizer.bpe('world')
            >>> print(encoded_token)
            >>> # Output: 'wo r ld'
            ```
        """
        if token in self.cache:
            return self.cache[token]
        word = tuple(token)
        pairs = get_pairs(word)

        if not pairs:
            return token

        while True:
            bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
            if bigram not in self.bpe_ranks:
                break
            first, second = bigram
            new_word = []
            i = 0
            while i < len(word):
                try:
                    j = word.index(first, i)
                except ValueError:
                    new_word.extend(word[i:])
                    break
                else:
                    new_word.extend(word[i:j])
                    i = j

                if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
                    new_word.append(first + second)
                    i += 2
                else:
                    new_word.append(word[i])
                    i += 1
            new_word = tuple(new_word)
            word = new_word
            if len(word) == 1:
                break
            pairs = get_pairs(word)
        word = " ".join(word)
        self.cache[token] = word
        return word

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer._tokenize
    def _tokenize(self, text):
        """Tokenize a string."""
        bpe_tokens = []
        for token in re.findall(self.pat, text):
            token = "".join(
                self.byte_encoder[b] for b in token.encode("utf-8")
            )  # Maps all our bytes to unicode strings, avoiding control tokens of the BPE (spaces in our case)
            bpe_tokens.extend(bpe_token for bpe_token in self.bpe(token).split(" "))
        return bpe_tokens

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer._convert_token_to_id
    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.encoder.get(token, self.encoder.get(self.unk_token))

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer._convert_id_to_token
    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.decoder.get(index)

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.convert_tokens_to_string
    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        text = "".join(tokens)
        text = bytearray([self.byte_decoder[c] for c in text]).decode("utf-8", errors=self.errors)
        return text

    def decode(
        self,
        token_ids,
        skip_special_tokens: bool = False,
        clean_up_tokenization_spaces: Optional[bool] = False,
        spaces_between_special_tokens: bool = False,
        **kwargs,
    ) -> str:
        """
        Decodes a list of token IDs into a string representation.

        Args:
            self: An instance of the Qwen2Tokenizer class.
            token_ids (List[int]): A list of token IDs to be decoded.
            skip_special_tokens (bool, optional): Whether to skip special tokens during decoding. Defaults to False.
            clean_up_tokenization_spaces (bool, optional): Whether to remove leading and trailing whitespaces
                around tokens. Defaults to False.
            spaces_between_special_tokens (bool, optional): Whether to add spaces between special tokens.
                Defaults to False.
            **kwargs: Additional keyword arguments to be passed to the superclass method.

        Returns:
            str: The decoded string representation of the given token IDs.

        Raises:
            None.

        Note:
            - Special tokens are typically used to mark the beginning and end of a sequence, or to represent special
            tokens such as padding or unknown tokens.
            - If skip_special_tokens is set to True, the special tokens will be excluded from the decoded string.
            - If clean_up_tokenization_spaces is set to True, any leading or trailing whitespaces around tokens
            will be removed.
            - If spaces_between_special_tokens is set to True, spaces will be added between special tokens
            in the decoded string.
        """
        # `spaces_between_special_tokens` defaults to True for _decode in slow tokenizers
        # and cannot be configured elsewhere, but it should default to False for Qwen2Tokenizer
        return super().decode(
            token_ids,
            skip_special_tokens=skip_special_tokens,
            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
            spaces_between_special_tokens=spaces_between_special_tokens,
            **kwargs,
        )

    # Copied from transformers.models.gpt2.tokenization_gpt2.GPT2Tokenizer.save_vocabulary
    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save vocabulary to a specified directory with an optional filename prefix.

        Args:
            self: An instance of the Qwen2Tokenizer class.
            save_directory (str): The directory where the vocabulary files will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the saved vocabulary filenames.

        Returns:
            Tuple[str]: A tuple containing the file paths of the saved vocabulary and merge files.

        Raises:
            FileNotFoundError: If the specified save_directory does not exist.
            IOError: If there are any issues with writing the vocabulary or merge files.
            ValueError: If the save_directory is not a valid directory path.
            Exception: Any other unexpected errors that may occur during the process.
        """
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
        merge_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["merges_file"]
        )

        with open(vocab_file, "w", encoding="utf-8") as f:
            f.write(json.dumps(self.encoder, indent=2, sort_keys=True, ensure_ascii=False) + "\n")

        index = 0
        with open(merge_file, "w", encoding="utf-8") as writer:
            writer.write("#version: 0.2\n")
            for bpe_tokens, token_index in sorted(self.bpe_ranks.items(), key=lambda kv: kv[1]):
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {merge_file}: BPE merge indices are not consecutive."
                        " Please check that the tokenizer is not corrupted!"
                    )
                    index = token_index
                writer.write(" ".join(bpe_tokens) + "\n")
                index += 1

        return vocab_file, merge_file

    def prepare_for_tokenization(self, text, **kwargs):
        """
        Prepares the given text for tokenization.

        Args:
            self (Qwen2Tokenizer): An instance of the Qwen2Tokenizer class.
            text (str): The text to be prepared for tokenization.

        Returns:
            None: The method modifies the text in-place.

        Raises:
            None.

        This method takes in an instance of the Qwen2Tokenizer class and a string of text.
        It prepares the text for tokenization by normalizing it using the 'NFC' (Normalization Form C) Unicode
        normalization.
        The normalization ensures that the text is in a standardized form, reducing any potential ambiguities or
        variations in the text. The method then returns the modified text along with any additional keyword
        arguments passed to the method.

        Note that this method modifies the text in-place, meaning that the original text variable will be
        updated with the normalized version. No values are returned explicitly by this method.
        """
        text = unicodedata.normalize("NFC", text)
        return (text, kwargs)

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.vocab_size: int property

Get the size of the vocabulary.

This method returns the number of unique tokens in the tokenizer's encoder.

PARAMETER DESCRIPTION
self

An instance of the Qwen2Tokenizer class.

TYPE: Qwen2Tokenizer

RETURNS DESCRIPTION
int

The size of the vocabulary.

TYPE: int

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.__init__(vocab_file, merges_file, errors='replace', unk_token='<|endoftext|>', bos_token=None, eos_token='<|endoftext|>', pad_token='<|endoftext|>', clean_up_tokenization_spaces=False, split_special_tokens=False, **kwargs)

Initializes an instance of the Qwen2Tokenizer class.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_file

The path to the vocabulary file.

TYPE: str

merges_file

The path to the merges file.

TYPE: str

errors

Specifies how to handle errors during tokenization. Defaults to 'replace'.

TYPE: str DEFAULT: 'replace'

unk_token

The unknown token. Defaults to 'endoftext'.

TYPE: str DEFAULT: '<|endoftext|>'

bos_token

The beginning-of-sequence token. Defaults to None.

TYPE: str or None DEFAULT: None

eos_token

The end-of-sequence token. Defaults to 'endoftext'.

TYPE: str DEFAULT: '<|endoftext|>'

pad_token

The padding token. Defaults to 'endoftext'.

TYPE: str DEFAULT: '<|endoftext|>'

clean_up_tokenization_spaces

Specifies whether to clean up tokenization spaces. Defaults to False.

TYPE: bool DEFAULT: False

split_special_tokens

Specifies whether to split special tokens. Defaults to False.

TYPE: bool DEFAULT: False

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
FileNotFoundError

If the vocab_file or merges_file does not exist.

UnicodeDecodeError

If there is an error decoding the vocab_file or merges_file.

ValueError

If the vocab_file or merges_file is empty.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
def __init__(
    self,
    vocab_file,
    merges_file,
    errors="replace",
    unk_token="<|endoftext|>",
    bos_token=None,
    eos_token="<|endoftext|>",
    pad_token="<|endoftext|>",
    clean_up_tokenization_spaces=False,
    split_special_tokens=False,
    **kwargs,
):
    """
    Initializes an instance of the Qwen2Tokenizer class.

    Args:
        self: The instance of the class.
        vocab_file (str): The path to the vocabulary file.
        merges_file (str): The path to the merges file.
        errors (str, optional): Specifies how to handle errors during tokenization. Defaults to 'replace'.
        unk_token (str, optional): The unknown token. Defaults to 'endoftext'.
        bos_token (str or None, optional): The beginning-of-sequence token. Defaults to None.
        eos_token (str, optional): The end-of-sequence token. Defaults to 'endoftext'.
        pad_token (str, optional): The padding token. Defaults to 'endoftext'.
        clean_up_tokenization_spaces (bool, optional): Specifies whether to clean up tokenization spaces.
            Defaults to False.
        split_special_tokens (bool, optional): Specifies whether to split special tokens. Defaults to False.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        FileNotFoundError: If the vocab_file or merges_file does not exist.
        UnicodeDecodeError: If there is an error decoding the vocab_file or merges_file.
        ValueError: If the vocab_file or merges_file is empty.
    """
    # Qwen vocab does not contain control tokens; added tokens need to be special
    bos_token = (
        AddedToken(bos_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(bos_token, str)
        else bos_token
    )
    eos_token = (
        AddedToken(eos_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(eos_token, str)
        else eos_token
    )
    unk_token = (
        AddedToken(unk_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(unk_token, str)
        else unk_token
    )
    pad_token = (
        AddedToken(pad_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(pad_token, str)
        else pad_token
    )

    with open(vocab_file, encoding="utf-8") as vocab_handle:
        self.encoder = json.load(vocab_handle)
    self.decoder = {v: k for k, v in self.encoder.items()}
    self.errors = errors  # how to handle errors in decoding
    self.byte_encoder = bytes_to_unicode()
    self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
    bpe_merges = []
    with open(merges_file, encoding="utf-8") as merges_handle:
        for line in merges_handle:
            line = line.strip()
            if not line or line.startswith("#"):
                continue
            bpe_merges.append(tuple(line.split()))
    self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))
    # NOTE: the cache can grow without bound and will get really large for long running processes
    # (esp. for texts of language that do not use space between word, e.g. Chinese); technically
    # not a memory leak but appears as one.
    # GPT2Tokenizer has the same problem, so let's be consistent.
    self.cache = {}

    self.pat = re.compile(PRETOKENIZE_REGEX)

    if kwargs.get("add_prefix_space", False):
        logger.warning_once(
            f"{self.__class__.__name} does not support `add_prefix_space`, setting it to True has no effect."
        )

    super().__init__(
        errors=errors,
        bos_token=bos_token,
        eos_token=eos_token,
        pad_token=pad_token,
        unk_token=unk_token,
        clean_up_tokenization_spaces=clean_up_tokenization_spaces,
        split_special_tokens=split_special_tokens,
        **kwargs,
    )

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.bpe(token)

Perform Byte Pair Encoding (BPE) on a given token.

PARAMETER DESCRIPTION
self

An instance of the Qwen2Tokenizer class.

TYPE: Qwen2Tokenizer

token

The input token to be encoded using BPE.

TYPE: str

RETURNS DESCRIPTION
str

The BPE-encoded version of the input token.

Note

This method applies Byte Pair Encoding (BPE) algorithm to a given token. BPE is a subword tokenization technique commonly used in natural language processing tasks. It splits a token into subword units based on the most frequently occurring pairs of characters.

The BPE algorithm starts by converting the token into a tuple of individual characters. It then identifies the most frequent character pairs using the get_pairs function. If no pairs are found, the original token is returned as it cannot be further split.

The algorithm iteratively replaces the most frequent character pair with a new subword unit. This process is repeated until no more frequent character pairs are found or the token is reduced to a single character.

Finally, the BPE-encoded token is returned as a string with subword units separated by spaces.

To improve performance, the method utilizes a cache to store previously processed tokens. If a token is found in the cache, its encoded version is returned directly without recomputing.

Example
>>> tokenizer = Qwen2Tokenizer()
>>> encoded_token = tokenizer.bpe('hello')
>>> print(encoded_token)
>>> # Output: 'he ll o'
...
>>> encoded_token = tokenizer.bpe('world')
>>> print(encoded_token)
>>> # Output: 'wo r ld'
Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def bpe(self, token):
    """
    Perform Byte Pair Encoding (BPE) on a given token.

    Args:
        self (Qwen2Tokenizer): An instance of the Qwen2Tokenizer class.
        token (str): The input token to be encoded using BPE.

    Returns:
        str: The BPE-encoded version of the input token.

    Raises:
        None.

    Note:
        This method applies Byte Pair Encoding (BPE) algorithm to a given token. BPE is a subword tokenization technique
        commonly used in natural language processing tasks. It splits a token into subword units based on the most
        frequently occurring pairs of characters.

        The BPE algorithm starts by converting the token into a tuple of individual characters. It then identifies the
        most frequent character pairs using the `get_pairs` function. If no pairs are found, the original token is
        returned as it cannot be further split.

        The algorithm iteratively replaces the most frequent character pair with a new subword unit. This process is
        repeated until no more frequent character pairs are found or the token is reduced to a single character.

        Finally, the BPE-encoded token is returned as a string with subword units separated by spaces.

        To improve performance, the method utilizes a cache to store previously processed tokens. If a token is found in
        the cache, its encoded version is returned directly without recomputing.

    Example:
        ```python
        >>> tokenizer = Qwen2Tokenizer()
        >>> encoded_token = tokenizer.bpe('hello')
        >>> print(encoded_token)
        >>> # Output: 'he ll o'
        ...
        >>> encoded_token = tokenizer.bpe('world')
        >>> print(encoded_token)
        >>> # Output: 'wo r ld'
        ```
    """
    if token in self.cache:
        return self.cache[token]
    word = tuple(token)
    pairs = get_pairs(word)

    if not pairs:
        return token

    while True:
        bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
        if bigram not in self.bpe_ranks:
            break
        first, second = bigram
        new_word = []
        i = 0
        while i < len(word):
            try:
                j = word.index(first, i)
            except ValueError:
                new_word.extend(word[i:])
                break
            else:
                new_word.extend(word[i:j])
                i = j

            if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
                new_word.append(first + second)
                i += 2
            else:
                new_word.append(word[i])
                i += 1
        new_word = tuple(new_word)
        word = new_word
        if len(word) == 1:
            break
        pairs = get_pairs(word)
    word = " ".join(word)
    self.cache[token] = word
    return word

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.convert_tokens_to_string(tokens)

Converts a sequence of tokens (string) in a single string.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
386
387
388
389
390
def convert_tokens_to_string(self, tokens):
    """Converts a sequence of tokens (string) in a single string."""
    text = "".join(tokens)
    text = bytearray([self.byte_decoder[c] for c in text]).decode("utf-8", errors=self.errors)
    return text

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.decode(token_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False, spaces_between_special_tokens=False, **kwargs)

Decodes a list of token IDs into a string representation.

PARAMETER DESCRIPTION
self

An instance of the Qwen2Tokenizer class.

token_ids

A list of token IDs to be decoded.

TYPE: List[int]

skip_special_tokens

Whether to skip special tokens during decoding. Defaults to False.

TYPE: bool DEFAULT: False

clean_up_tokenization_spaces

Whether to remove leading and trailing whitespaces around tokens. Defaults to False.

TYPE: bool DEFAULT: False

spaces_between_special_tokens

Whether to add spaces between special tokens. Defaults to False.

TYPE: bool DEFAULT: False

**kwargs

Additional keyword arguments to be passed to the superclass method.

DEFAULT: {}

RETURNS DESCRIPTION
str

The decoded string representation of the given token IDs.

TYPE: str

Note
  • Special tokens are typically used to mark the beginning and end of a sequence, or to represent special tokens such as padding or unknown tokens.
  • If skip_special_tokens is set to True, the special tokens will be excluded from the decoded string.
  • If clean_up_tokenization_spaces is set to True, any leading or trailing whitespaces around tokens will be removed.
  • If spaces_between_special_tokens is set to True, spaces will be added between special tokens in the decoded string.
Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
def decode(
    self,
    token_ids,
    skip_special_tokens: bool = False,
    clean_up_tokenization_spaces: Optional[bool] = False,
    spaces_between_special_tokens: bool = False,
    **kwargs,
) -> str:
    """
    Decodes a list of token IDs into a string representation.

    Args:
        self: An instance of the Qwen2Tokenizer class.
        token_ids (List[int]): A list of token IDs to be decoded.
        skip_special_tokens (bool, optional): Whether to skip special tokens during decoding. Defaults to False.
        clean_up_tokenization_spaces (bool, optional): Whether to remove leading and trailing whitespaces
            around tokens. Defaults to False.
        spaces_between_special_tokens (bool, optional): Whether to add spaces between special tokens.
            Defaults to False.
        **kwargs: Additional keyword arguments to be passed to the superclass method.

    Returns:
        str: The decoded string representation of the given token IDs.

    Raises:
        None.

    Note:
        - Special tokens are typically used to mark the beginning and end of a sequence, or to represent special
        tokens such as padding or unknown tokens.
        - If skip_special_tokens is set to True, the special tokens will be excluded from the decoded string.
        - If clean_up_tokenization_spaces is set to True, any leading or trailing whitespaces around tokens
        will be removed.
        - If spaces_between_special_tokens is set to True, spaces will be added between special tokens
        in the decoded string.
    """
    # `spaces_between_special_tokens` defaults to True for _decode in slow tokenizers
    # and cannot be configured elsewhere, but it should default to False for Qwen2Tokenizer
    return super().decode(
        token_ids,
        skip_special_tokens=skip_special_tokens,
        clean_up_tokenization_spaces=clean_up_tokenization_spaces,
        spaces_between_special_tokens=spaces_between_special_tokens,
        **kwargs,
    )

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.get_vocab()

Returns the vocabulary of the tokenizer.

PARAMETER DESCRIPTION
self

The instance of the Qwen2Tokenizer class.

TYPE: Qwen2Tokenizer

RETURNS DESCRIPTION
dict

A dictionary representing the vocabulary of the tokenizer. The keys are the tokens, and the values are their corresponding indices in the vocabulary.

Note

The vocabulary is obtained by merging the encoder and added_tokens_encoder dictionaries of the tokenizer instance.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def get_vocab(self):
    """
    Returns the vocabulary of the tokenizer.

    Args:
        self (Qwen2Tokenizer): The instance of the Qwen2Tokenizer class.

    Returns:
        dict: A dictionary representing the vocabulary of the tokenizer.
            The keys are the tokens, and the values are their corresponding indices in the vocabulary.

    Raises:
        None.

    Note:
        The vocabulary is obtained by merging the `encoder` and `added_tokens_encoder` dictionaries of the
        tokenizer instance.
    """
    return dict(self.encoder, **self.added_tokens_encoder)

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.prepare_for_tokenization(text, **kwargs)

Prepares the given text for tokenization.

PARAMETER DESCRIPTION
self

An instance of the Qwen2Tokenizer class.

TYPE: Qwen2Tokenizer

text

The text to be prepared for tokenization.

TYPE: str

RETURNS DESCRIPTION
None

The method modifies the text in-place.

This method takes in an instance of the Qwen2Tokenizer class and a string of text. It prepares the text for tokenization by normalizing it using the 'NFC' (Normalization Form C) Unicode normalization. The normalization ensures that the text is in a standardized form, reducing any potential ambiguities or variations in the text. The method then returns the modified text along with any additional keyword arguments passed to the method.

Note that this method modifies the text in-place, meaning that the original text variable will be updated with the normalized version. No values are returned explicitly by this method.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
def prepare_for_tokenization(self, text, **kwargs):
    """
    Prepares the given text for tokenization.

    Args:
        self (Qwen2Tokenizer): An instance of the Qwen2Tokenizer class.
        text (str): The text to be prepared for tokenization.

    Returns:
        None: The method modifies the text in-place.

    Raises:
        None.

    This method takes in an instance of the Qwen2Tokenizer class and a string of text.
    It prepares the text for tokenization by normalizing it using the 'NFC' (Normalization Form C) Unicode
    normalization.
    The normalization ensures that the text is in a standardized form, reducing any potential ambiguities or
    variations in the text. The method then returns the modified text along with any additional keyword
    arguments passed to the method.

    Note that this method modifies the text in-place, meaning that the original text variable will be
    updated with the normalized version. No values are returned explicitly by this method.
    """
    text = unicodedata.normalize("NFC", text)
    return (text, kwargs)

mindnlp.transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save vocabulary to a specified directory with an optional filename prefix.

PARAMETER DESCRIPTION
self

An instance of the Qwen2Tokenizer class.

save_directory

The directory where the vocabulary files will be saved.

TYPE: str

filename_prefix

An optional prefix to be added to the saved vocabulary filenames.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the file paths of the saved vocabulary and merge files.

RAISES DESCRIPTION
FileNotFoundError

If the specified save_directory does not exist.

IOError

If there are any issues with writing the vocabulary or merge files.

ValueError

If the save_directory is not a valid directory path.

Exception

Any other unexpected errors that may occur during the process.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save vocabulary to a specified directory with an optional filename prefix.

    Args:
        self: An instance of the Qwen2Tokenizer class.
        save_directory (str): The directory where the vocabulary files will be saved.
        filename_prefix (Optional[str]): An optional prefix to be added to the saved vocabulary filenames.

    Returns:
        Tuple[str]: A tuple containing the file paths of the saved vocabulary and merge files.

    Raises:
        FileNotFoundError: If the specified save_directory does not exist.
        IOError: If there are any issues with writing the vocabulary or merge files.
        ValueError: If the save_directory is not a valid directory path.
        Exception: Any other unexpected errors that may occur during the process.
    """
    if not os.path.isdir(save_directory):
        logger.error(f"Vocabulary path ({save_directory}) should be a directory")
        return
    vocab_file = os.path.join(
        save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
    )
    merge_file = os.path.join(
        save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["merges_file"]
    )

    with open(vocab_file, "w", encoding="utf-8") as f:
        f.write(json.dumps(self.encoder, indent=2, sort_keys=True, ensure_ascii=False) + "\n")

    index = 0
    with open(merge_file, "w", encoding="utf-8") as writer:
        writer.write("#version: 0.2\n")
        for bpe_tokens, token_index in sorted(self.bpe_ranks.items(), key=lambda kv: kv[1]):
            if index != token_index:
                logger.warning(
                    f"Saving vocabulary to {merge_file}: BPE merge indices are not consecutive."
                    " Please check that the tokenizer is not corrupted!"
                )
                index = token_index
            writer.write(" ".join(bpe_tokens) + "\n")
            index += 1

    return vocab_file, merge_file

mindnlp.transformers.models.qwen2.tokenization_qwen2.bytes_to_unicode() cached

Returns list of utf-8 byte and a mapping to unicode strings. We specifically avoids mapping to whitespace/control characters the bpe code barfs on.

The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab if you want to avoid UNKs. When you're at something like a 10B token dataset you end up needing around 5K for decent coverage. This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup tables between utf-8 bytes and unicode strings.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
@lru_cache()
# Copied from transformers.models.gpt2.tokenization_gpt2.bytes_to_unicode
def bytes_to_unicode():
    """
    Returns list of utf-8 byte and a mapping to unicode strings. We specifically avoids mapping to whitespace/control
    characters the bpe code barfs on.

    The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab
    if you want to avoid UNKs. When you're at something like a 10B token dataset you end up needing around 5K for
    decent coverage. This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup
    tables between utf-8 bytes and unicode strings.
    """
    bs = (
        list(range(ord("!"), ord("~") + 1)) + list(range(ord("¡"), ord("¬") + 1)) + list(range(ord("®"), ord("ÿ") + 1))
    )
    cs = bs[:]
    n = 0
    for b in range(2**8):
        if b not in bs:
            bs.append(b)
            cs.append(2**8 + n)
            n += 1
    cs = [chr(n) for n in cs]
    return dict(zip(bs, cs))

mindnlp.transformers.models.qwen2.tokenization_qwen2.get_pairs(word)

Return set of symbol pairs in a word.

Word is represented as tuple of symbols (symbols being variable-length strings).

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2.py
74
75
76
77
78
79
80
81
82
83
84
85
def get_pairs(word):
    """
    Return set of symbol pairs in a word.

    Word is represented as tuple of symbols (symbols being variable-length strings).
    """
    pairs = set()
    prev_char = word[0]
    for char in word[1:]:
        pairs.add((prev_char, char))
        prev_char = char
    return pairs

mindnlp.transformers.models.qwen2.tokenization_qwen2_fast

Tokenization classes for Qwen2.

mindnlp.transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast

Bases: PreTrainedTokenizerFast

Construct a "fast" Qwen2 tokenizer (backed by HuggingFace's tokenizers library). Based on byte-level Byte-Pair-Encoding.

Same with GPT2Tokenizer, this tokenizer has been trained to treat spaces like parts of the tokens so a word will be encoded differently whether it is at the beginning of the sentence (without space) or not:

Example
>>> from transformers import Qwen2TokenizerFast
...
>>> tokenizer = Qwen2TokenizerFast.from_pretrained("Qwen/Qwen-tokenizer")
>>> tokenizer("Hello world")["input_ids"]
[9707, 1879]
>>> tokenizer(" Hello world")["input_ids"]
[21927, 1879]

This is expected.

This tokenizer inherits from [PreTrainedTokenizerFast] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str`, *optional* DEFAULT: None

merges_file

Path to the merges file.

TYPE: `str`, *optional* DEFAULT: None

tokenizer_file

Path to tokenizers file (generally has a .json extension) that contains everything needed to load the tokenizer.

TYPE: `str`, *optional* DEFAULT: None

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. Not applicable to this tokenizer.

TYPE: `str`, *optional*, defaults to `"<|endoftext|>"` DEFAULT: '<|endoftext|>'

bos_token

The beginning of sequence token. Not applicable for this tokenizer.

TYPE: `str`, *optional* DEFAULT: None

eos_token

The end of sequence token.

TYPE: `str`, *optional*, defaults to `"<|endoftext|>"` DEFAULT: '<|endoftext|>'

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<|endoftext|>"` DEFAULT: '<|endoftext|>'

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2_fast.py
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
class Qwen2TokenizerFast(PreTrainedTokenizerFast):
    """
    Construct a "fast" Qwen2 tokenizer (backed by HuggingFace's *tokenizers* library). Based on byte-level
    Byte-Pair-Encoding.

    Same with GPT2Tokenizer, this tokenizer has been trained to treat spaces like parts of the tokens so a word will
    be encoded differently whether it is at the beginning of the sentence (without space) or not:

    Example:
        ```python
        >>> from transformers import Qwen2TokenizerFast
        ...
        >>> tokenizer = Qwen2TokenizerFast.from_pretrained("Qwen/Qwen-tokenizer")
        >>> tokenizer("Hello world")["input_ids"]
        [9707, 1879]
        >>> tokenizer(" Hello world")["input_ids"]
        [21927, 1879]
        ```
    This is expected.

    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
    refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`, *optional*):
            Path to the vocabulary file.
        merges_file (`str`, *optional*):
            Path to the merges file.
        tokenizer_file (`str`, *optional*):
            Path to [tokenizers](https://github.com/huggingface/tokenizers) file (generally has a .json extension) that
            contains everything needed to load the tokenizer.
        unk_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead. Not applicable to this tokenizer.
        bos_token (`str`, *optional*):
            The beginning of sequence token. Not applicable for this tokenizer.
        eos_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
            The end of sequence token.
        pad_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
            The token used for padding, for example when batching sequences of different lengths.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = MAX_MODEL_INPUT_SIZES
    model_input_names = ["input_ids", "attention_mask"]
    slow_tokenizer_class = Qwen2Tokenizer

    def __init__(
        self,
        vocab_file=None,
        merges_file=None,
        tokenizer_file=None,
        unk_token="<|endoftext|>",
        bos_token=None,
        eos_token="<|endoftext|>",
        pad_token="<|endoftext|>",
        **kwargs,
    ):
        """
        Initializes a new instance of the Qwen2TokenizerFast class.

        Args:
            self: The instance of the class.
            vocab_file (str, optional): The path to the vocabulary file. Default is None.
            merges_file (str, optional): The path to the merges file. Default is None.
            tokenizer_file (str, optional): The path to the tokenizer file. Default is None.
            unk_token (str, optional): The unknown token. Default is 'endoftext'.
            bos_token (str or AddedToken, optional): The beginning of sequence token. Default is None.
            eos_token (str or AddedToken, optional): The end of sequence token. Default is 'endoftext'.
            pad_token (str or AddedToken, optional): The padding token. Default is 'endoftext'.

        Returns:
            None.

        Raises:
            None.

        Note:
            - The bos_token, eos_token, unk_token, and pad_token parameters can be either a string or an instance of
            the AddedToken class.
            - If any of the bos_token, eos_token, unk_token, or pad_token parameters are provided as strings,
            they will be converted to AddedToken instances with default properties.
            - The vocab_file, merges_file, and tokenizer_file parameters are used to load the respective files
            for the tokenizer.
            - The unk_token, bos_token, eos_token, and pad_token parameters are used to set the respective tokens
            in the tokenizer.
            - Additional keyword arguments can be provided and will be passed to the base class forwardor.
        """
        # We need to at least pass vocab_file and merges_file to base class
        # in case a slow tokenizer needs to be initialized; other can be
        # configured through files.
        # following GPT2TokenizerFast, also adding unk_token, bos_token, and eos_token

        bos_token = (
            AddedToken(bos_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(bos_token, str)
            else bos_token
        )
        eos_token = (
            AddedToken(eos_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(eos_token, str)
            else eos_token
        )
        unk_token = (
            AddedToken(unk_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(unk_token, str)
            else unk_token
        )
        pad_token = (
            AddedToken(pad_token, lstrip=False, rstrip=False, special=True, normalized=False)
            if isinstance(pad_token, str)
            else pad_token
        )

        super().__init__(
            vocab_file,
            merges_file,
            tokenizer_file=tokenizer_file,
            unk_token=unk_token,
            bos_token=bos_token,
            eos_token=eos_token,
            pad_token=pad_token,
            **kwargs,
        )

    # Copied from transformers.models.gpt2.tokenization_gpt2_fast.GPT2TokenizerFast.save_vocabulary
    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary of the Qwen2TokenizerFast model to the specified directory.

        Args:
            self: The instance of the Qwen2TokenizerFast class.
            save_directory (str): The directory where the vocabulary files will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the vocabulary filenames. Default is None.

        Returns:
            Tuple[str]: A tuple containing the filenames of the saved vocabulary files.

        Raises:
            This method does not explicitly raise any exceptions.
        """
        files = self._tokenizer.model.save(save_directory, name=filename_prefix)
        return tuple(files)

mindnlp.transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast.__init__(vocab_file=None, merges_file=None, tokenizer_file=None, unk_token='<|endoftext|>', bos_token=None, eos_token='<|endoftext|>', pad_token='<|endoftext|>', **kwargs)

Initializes a new instance of the Qwen2TokenizerFast class.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_file

The path to the vocabulary file. Default is None.

TYPE: str DEFAULT: None

merges_file

The path to the merges file. Default is None.

TYPE: str DEFAULT: None

tokenizer_file

The path to the tokenizer file. Default is None.

TYPE: str DEFAULT: None

unk_token

The unknown token. Default is 'endoftext'.

TYPE: str DEFAULT: '<|endoftext|>'

bos_token

The beginning of sequence token. Default is None.

TYPE: str or AddedToken DEFAULT: None

eos_token

The end of sequence token. Default is 'endoftext'.

TYPE: str or AddedToken DEFAULT: '<|endoftext|>'

pad_token

The padding token. Default is 'endoftext'.

TYPE: str or AddedToken DEFAULT: '<|endoftext|>'

RETURNS DESCRIPTION

None.

Note
  • The bos_token, eos_token, unk_token, and pad_token parameters can be either a string or an instance of the AddedToken class.
  • If any of the bos_token, eos_token, unk_token, or pad_token parameters are provided as strings, they will be converted to AddedToken instances with default properties.
  • The vocab_file, merges_file, and tokenizer_file parameters are used to load the respective files for the tokenizer.
  • The unk_token, bos_token, eos_token, and pad_token parameters are used to set the respective tokens in the tokenizer.
  • Additional keyword arguments can be provided and will be passed to the base class forwardor.
Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2_fast.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
def __init__(
    self,
    vocab_file=None,
    merges_file=None,
    tokenizer_file=None,
    unk_token="<|endoftext|>",
    bos_token=None,
    eos_token="<|endoftext|>",
    pad_token="<|endoftext|>",
    **kwargs,
):
    """
    Initializes a new instance of the Qwen2TokenizerFast class.

    Args:
        self: The instance of the class.
        vocab_file (str, optional): The path to the vocabulary file. Default is None.
        merges_file (str, optional): The path to the merges file. Default is None.
        tokenizer_file (str, optional): The path to the tokenizer file. Default is None.
        unk_token (str, optional): The unknown token. Default is 'endoftext'.
        bos_token (str or AddedToken, optional): The beginning of sequence token. Default is None.
        eos_token (str or AddedToken, optional): The end of sequence token. Default is 'endoftext'.
        pad_token (str or AddedToken, optional): The padding token. Default is 'endoftext'.

    Returns:
        None.

    Raises:
        None.

    Note:
        - The bos_token, eos_token, unk_token, and pad_token parameters can be either a string or an instance of
        the AddedToken class.
        - If any of the bos_token, eos_token, unk_token, or pad_token parameters are provided as strings,
        they will be converted to AddedToken instances with default properties.
        - The vocab_file, merges_file, and tokenizer_file parameters are used to load the respective files
        for the tokenizer.
        - The unk_token, bos_token, eos_token, and pad_token parameters are used to set the respective tokens
        in the tokenizer.
        - Additional keyword arguments can be provided and will be passed to the base class forwardor.
    """
    # We need to at least pass vocab_file and merges_file to base class
    # in case a slow tokenizer needs to be initialized; other can be
    # configured through files.
    # following GPT2TokenizerFast, also adding unk_token, bos_token, and eos_token

    bos_token = (
        AddedToken(bos_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(bos_token, str)
        else bos_token
    )
    eos_token = (
        AddedToken(eos_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(eos_token, str)
        else eos_token
    )
    unk_token = (
        AddedToken(unk_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(unk_token, str)
        else unk_token
    )
    pad_token = (
        AddedToken(pad_token, lstrip=False, rstrip=False, special=True, normalized=False)
        if isinstance(pad_token, str)
        else pad_token
    )

    super().__init__(
        vocab_file,
        merges_file,
        tokenizer_file=tokenizer_file,
        unk_token=unk_token,
        bos_token=bos_token,
        eos_token=eos_token,
        pad_token=pad_token,
        **kwargs,
    )

mindnlp.transformers.models.qwen2.tokenization_qwen2_fast.Qwen2TokenizerFast.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary of the Qwen2TokenizerFast model to the specified directory.

PARAMETER DESCRIPTION
self

The instance of the Qwen2TokenizerFast class.

save_directory

The directory where the vocabulary files will be saved.

TYPE: str

filename_prefix

An optional prefix to be added to the vocabulary filenames. Default is None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the filenames of the saved vocabulary files.

Source code in mindnlp\transformers\models\qwen2\tokenization_qwen2_fast.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary of the Qwen2TokenizerFast model to the specified directory.

    Args:
        self: The instance of the Qwen2TokenizerFast class.
        save_directory (str): The directory where the vocabulary files will be saved.
        filename_prefix (Optional[str]): An optional prefix to be added to the vocabulary filenames. Default is None.

    Returns:
        Tuple[str]: A tuple containing the filenames of the saved vocabulary files.

    Raises:
        This method does not explicitly raise any exceptions.
    """
    files = self._tokenizer.model.save(save_directory, name=filename_prefix)
    return tuple(files)