跳转至

cpmant

mindnlp.transformers.models.cpmant.configuration_cpmant

CPMAnt model configuration

mindnlp.transformers.models.cpmant.configuration_cpmant.CpmAntConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [CpmAntModel]. It is used to instantiate an CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the CPMAnt openbmb/cpm-ant-10b architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the CPMAnt model. Defines the number of different tokens that can be represented by the input passed when calling [CpmAntModel].

TYPE: `int`, *optional*, defaults to 30720 DEFAULT: 30720

hidden_size

Dimension of the encoder layers.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

num_attention_heads

Number of attention heads in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

dim_head

Dimension of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 128 DEFAULT: 128

dim_ff

Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 10240 DEFAULT: 10240

num_hidden_layers

Number of layers of the Transformer encoder.

TYPE: `int`, *optional*, defaults to 48 DEFAULT: 48

dropout_p

The dropout probability for all fully connected layers in the embeddings, encoder.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

position_bias_num_buckets

The number of position_bias buckets.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

position_bias_max_distance

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 2048 DEFAULT: 2048

eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-06 DEFAULT: 1e-06

init_std

Initialize parameters with std = init_std.

TYPE: `float`, *optional*, defaults to 1.0 DEFAULT: 1.0

prompt_types

The type of prompt.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

prompt_length

The length of prompt.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

segment_types

The type of segment.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

use_cache

Whether to use cache.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Example
>>> from transformers import CpmAntModel, CpmAntConfig
...
>>> # Initializing a CPMAnt cpm-ant-10b style configuration
>>> configuration = CpmAntConfig()
...
>>> # Initializing a model from the cpm-ant-10b style configuration
>>> model = CpmAntModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp\transformers\models\cpmant\configuration_cpmant.py
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
class CpmAntConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`CpmAntModel`]. It is used to instantiate an
    CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the CPMAnt
    [openbmb/cpm-ant-10b](https://hf-mirror.com/openbmb/cpm-ant-10b) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 30720):
            Vocabulary size of the CPMAnt model. Defines the number of different tokens that can be represented by the
            `input` passed when calling [`CpmAntModel`].
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimension of the encoder layers.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads in the Transformer encoder.
        dim_head (`int`, *optional*, defaults to 128):
            Dimension of attention heads for each attention layer in the Transformer encoder.
        dim_ff (`int`, *optional*, defaults to 10240):
            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
        num_hidden_layers (`int`, *optional*, defaults to 48):
            Number of layers of the Transformer encoder.
        dropout_p (`float`, *optional*, defaults to 0.0):
            The dropout probability for all fully connected layers in the embeddings, encoder.
        position_bias_num_buckets (`int`, *optional*, defaults to 512):
            The number of position_bias buckets.
        position_bias_max_distance (`int`, *optional*, defaults to 2048):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        eps (`float`, *optional*, defaults to 1e-06):
            The epsilon used by the layer normalization layers.
        init_std (`float`, *optional*, defaults to 1.0):
            Initialize parameters with std = init_std.
        prompt_types (`int`, *optional*, defaults to 32):
            The type of prompt.
        prompt_length (`int`, *optional*, defaults to 32):
            The length of prompt.
        segment_types (`int`, *optional*, defaults to 32):
            The type of segment.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether to use cache.

    Example:
        ```python
        >>> from transformers import CpmAntModel, CpmAntConfig
        ...
        >>> # Initializing a CPMAnt cpm-ant-10b style configuration
        >>> configuration = CpmAntConfig()
        ...
        >>> # Initializing a model from the cpm-ant-10b style configuration
        >>> model = CpmAntModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "cpmant"

    def __init__(
        self,
        vocab_size: int = 30720,
        hidden_size: int = 4096,
        num_attention_heads: int = 32,
        dim_head: int = 128,
        dim_ff: int = 10240,
        num_hidden_layers: int = 48,
        dropout_p: int = 0.0,
        position_bias_num_buckets: int = 512,
        position_bias_max_distance: int = 2048,
        eps: int = 1e-6,
        init_std: float = 1.0,
        prompt_types: int = 32,
        prompt_length: int = 32,
        segment_types: int = 32,
        use_cache: bool = True,
        **kwargs,
    ):
        """
        Initializes an instance of the CpmAntConfig class.

        Args:
            self (CpmAntConfig): The instance of the CpmAntConfig class.
            vocab_size (int): The size of the vocabulary. Defaults to 30720.
            hidden_size (int): The size of the hidden state. Defaults to 4096.
            num_attention_heads (int): The number of attention heads. Defaults to 32.
            dim_head (int): The dimension of each attention head. Defaults to 128.
            dim_ff (int): The dimension of the feed-forward layer. Defaults to 10240.
            num_hidden_layers (int): The number of hidden layers. Defaults to 48.
            dropout_p (float): The dropout rate. Defaults to 0.0.
            position_bias_num_buckets (int): The number of buckets for position bias. Defaults to 512.
            position_bias_max_distance (int): The maximum distance for position bias. Defaults to 2048.
            eps (float): The epsilon value for numerical stability. Defaults to 1e-06.
            init_std (float): The standard deviation for weight initialization. Defaults to 1.0.
            prompt_types (int): The number of prompt types. Defaults to 32.
            prompt_length (int): The length of the prompt. Defaults to 32.
            segment_types (int): The number of segment types. Defaults to 32.
            use_cache (bool): Whether to use cache. Defaults to True.

        Returns:
            None.

        Raises:
            None.
        """
        """"""
        super().__init__(**kwargs)
        self.prompt_types = prompt_types
        self.prompt_length = prompt_length
        self.segment_types = segment_types
        self.hidden_size = hidden_size
        self.num_attention_heads = num_attention_heads
        self.dim_head = dim_head
        self.dim_ff = dim_ff
        self.num_hidden_layers = num_hidden_layers
        self.position_bias_num_buckets = position_bias_num_buckets
        self.position_bias_max_distance = position_bias_max_distance
        self.dropout_p = dropout_p
        self.eps = eps
        self.use_cache = use_cache
        self.vocab_size = vocab_size
        self.init_std = init_std

mindnlp.transformers.models.cpmant.configuration_cpmant.CpmAntConfig.__init__(vocab_size=30720, hidden_size=4096, num_attention_heads=32, dim_head=128, dim_ff=10240, num_hidden_layers=48, dropout_p=0.0, position_bias_num_buckets=512, position_bias_max_distance=2048, eps=1e-06, init_std=1.0, prompt_types=32, prompt_length=32, segment_types=32, use_cache=True, **kwargs)

Initializes an instance of the CpmAntConfig class.

PARAMETER DESCRIPTION
self

The instance of the CpmAntConfig class.

TYPE: CpmAntConfig

vocab_size

The size of the vocabulary. Defaults to 30720.

TYPE: int DEFAULT: 30720

hidden_size

The size of the hidden state. Defaults to 4096.

TYPE: int DEFAULT: 4096

num_attention_heads

The number of attention heads. Defaults to 32.

TYPE: int DEFAULT: 32

dim_head

The dimension of each attention head. Defaults to 128.

TYPE: int DEFAULT: 128

dim_ff

The dimension of the feed-forward layer. Defaults to 10240.

TYPE: int DEFAULT: 10240

num_hidden_layers

The number of hidden layers. Defaults to 48.

TYPE: int DEFAULT: 48

dropout_p

The dropout rate. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

position_bias_num_buckets

The number of buckets for position bias. Defaults to 512.

TYPE: int DEFAULT: 512

position_bias_max_distance

The maximum distance for position bias. Defaults to 2048.

TYPE: int DEFAULT: 2048

eps

The epsilon value for numerical stability. Defaults to 1e-06.

TYPE: float DEFAULT: 1e-06

init_std

The standard deviation for weight initialization. Defaults to 1.0.

TYPE: float DEFAULT: 1.0

prompt_types

The number of prompt types. Defaults to 32.

TYPE: int DEFAULT: 32

prompt_length

The length of the prompt. Defaults to 32.

TYPE: int DEFAULT: 32

segment_types

The number of segment types. Defaults to 32.

TYPE: int DEFAULT: 32

use_cache

Whether to use cache. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\cpmant\configuration_cpmant.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def __init__(
    self,
    vocab_size: int = 30720,
    hidden_size: int = 4096,
    num_attention_heads: int = 32,
    dim_head: int = 128,
    dim_ff: int = 10240,
    num_hidden_layers: int = 48,
    dropout_p: int = 0.0,
    position_bias_num_buckets: int = 512,
    position_bias_max_distance: int = 2048,
    eps: int = 1e-6,
    init_std: float = 1.0,
    prompt_types: int = 32,
    prompt_length: int = 32,
    segment_types: int = 32,
    use_cache: bool = True,
    **kwargs,
):
    """
    Initializes an instance of the CpmAntConfig class.

    Args:
        self (CpmAntConfig): The instance of the CpmAntConfig class.
        vocab_size (int): The size of the vocabulary. Defaults to 30720.
        hidden_size (int): The size of the hidden state. Defaults to 4096.
        num_attention_heads (int): The number of attention heads. Defaults to 32.
        dim_head (int): The dimension of each attention head. Defaults to 128.
        dim_ff (int): The dimension of the feed-forward layer. Defaults to 10240.
        num_hidden_layers (int): The number of hidden layers. Defaults to 48.
        dropout_p (float): The dropout rate. Defaults to 0.0.
        position_bias_num_buckets (int): The number of buckets for position bias. Defaults to 512.
        position_bias_max_distance (int): The maximum distance for position bias. Defaults to 2048.
        eps (float): The epsilon value for numerical stability. Defaults to 1e-06.
        init_std (float): The standard deviation for weight initialization. Defaults to 1.0.
        prompt_types (int): The number of prompt types. Defaults to 32.
        prompt_length (int): The length of the prompt. Defaults to 32.
        segment_types (int): The number of segment types. Defaults to 32.
        use_cache (bool): Whether to use cache. Defaults to True.

    Returns:
        None.

    Raises:
        None.
    """
    """"""
    super().__init__(**kwargs)
    self.prompt_types = prompt_types
    self.prompt_length = prompt_length
    self.segment_types = segment_types
    self.hidden_size = hidden_size
    self.num_attention_heads = num_attention_heads
    self.dim_head = dim_head
    self.dim_ff = dim_ff
    self.num_hidden_layers = num_hidden_layers
    self.position_bias_num_buckets = position_bias_num_buckets
    self.position_bias_max_distance = position_bias_max_distance
    self.dropout_p = dropout_p
    self.eps = eps
    self.use_cache = use_cache
    self.vocab_size = vocab_size
    self.init_std = init_std

mindnlp.transformers.models.cpmant.tokenization_cpmant

Tokenization classes for CPMAnt.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer

Bases: PreTrainedTokenizer

Construct a CPMAnt tokenizer. Based on byte-level Byte-Pair-Encoding.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str`

bod_token

The beginning of document token.

TYPE: `str`, *optional*, defaults to `"<d>"` DEFAULT: '<d>'

eod_token

The end of document token.

TYPE: `str`, *optional*, defaults to `"</d>"` DEFAULT: '</d>'

bos_token

The beginning of sequence token.

TYPE: `str`, *optional*, defaults to `"<s>"` DEFAULT: '<s>'

eos_token

The end of sequence token.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

pad_token

The token used for padding.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

unk_token

The unknown token.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

line_token

The line token.

TYPE: `str`, *optional*, defaults to `"</n>"` DEFAULT: '</n>'

space_token

The space token.

TYPE: `str`, *optional*, defaults to `"</_>"` DEFAULT: '</_>'

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
class CpmAntTokenizer(PreTrainedTokenizer):
    """
    Construct a CPMAnt tokenizer. Based on byte-level Byte-Pair-Encoding.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        bod_token (`str`, *optional*, defaults to `"<d>"`):
            The beginning of document token.
        eod_token (`str`, *optional*, defaults to `"</d>"`):
            The end of document token.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token.
        line_token (`str`, *optional*, defaults to `"</n>"`):
            The line token.
        space_token (`str`, *optional*, defaults to `"</_>"`):
            The space token.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ["input_ids", "attention_mask"]
    add_prefix_space = False

    def __init__(
        self,
        vocab_file,
        bod_token="<d>",
        eod_token="</d>",
        bos_token="<s>",
        eos_token="</s>",
        pad_token="<pad>",
        unk_token="<unk>",
        line_token="</n>",
        space_token="</_>",
        padding_side="left",
        **kwargs,
    ):
        """
        Initialize a CpmAntTokenizer object with the provided parameters.

        Args:
            vocab_file (str): The path to the vocabulary file to load.
            bod_token (str, optional): Beginning of document token (default is '<d>').
            eod_token (str, optional): End of document token (default is '</d>').
            bos_token (str, optional): Beginning of sentence token (default is '<s>').
            eos_token (str, optional): End of sentence token (default is '</s>').
            pad_token (str, optional): Padding token (default is '<pad>').
            unk_token (str, optional): Token for unknown words (default is '<unk>').
            line_token (str, optional): Line break token (default is '</n>').
            space_token (str, optional): Space token (default is '</_>').
            padding_side (str, optional): Side for padding (default is 'left').

        Returns:
            None.

        Raises:
            MissingBackendError: If required backend 'jieba' is not available.
            FileNotFoundError: If the specified 'vocab_file' does not exist.
            KeyError: If 'space_token' or 'line_token' are missing in the loaded vocabulary.
            Exception: Any other unforeseen error that may occur during initialization.
        """
        requires_backends(self, ["jieba"])
        self.bod_token = bod_token
        self.eod_token = eod_token
        self.encoder = load_vocab(vocab_file)
        self.encoder[" "] = self.encoder[space_token]
        self.encoder["\n"] = self.encoder[line_token]

        del self.encoder[space_token]
        del self.encoder[line_token]

        self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
        self.decoder = {v: k for k, v in self.encoder.items()}

        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.encoder, unk_token=unk_token)

        super().__init__(
            bod_token=bod_token,
            eod_token=eod_token,
            bos_token=bos_token,
            eos_token=eos_token,
            pad_token=pad_token,
            unk_token=unk_token,
            line_token=line_token,
            space_token=space_token,
            padding_side=padding_side,
            **kwargs,
        )

    @property
    def bod_token_id(self):
        """
        This method, 'bod_token_id', is a property method defined in the 'CpmAntTokenizer' class.
        It takes no external parameters and returns the token ID associated with the 'bod_token'.

        Args:
            self (CpmAntTokenizer): The instance of the CpmAntTokenizer class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.encoder[self.bod_token]

    @property
    def eod_token_id(self):
        """
        This method 'eod_token_id' in the class 'CpmAntTokenizer' retrieves the token ID of the end-of-document token.

        Args:
            self: An instance of the class CpmAntTokenizer.
                It is required as this method is part of the class and needs access to its attributes and methods.

        Returns:
            None: This method returns a value of type None.
                It retrieves the token ID of the end-of-document token from the encoder attribute of the class instance.

        Raises:
            None.
        """
        return self.encoder[self.eod_token]

    @property
    def newline_id(self):
        r"""
        This method, newline_id, in the class CpmAntTokenizer, returns the value associated with the newline character in the encoder.

        Args:
            self (CpmAntTokenizer): The instance of the CpmAntTokenizer class.

        Returns:
            None.

        Raises:
            KeyError: If the newline character `'\n'` is not found in the encoder dictionary, a KeyError is raised.
        """
        return self.encoder["\n"]

    @property
    def vocab_size(self) -> int:
        """
        Returns the size of the vocabulary used by the CpmAntTokenizer instance.

        Args:
            self: The CpmAntTokenizer instance itself.

        Returns:
            int: The number of unique tokens in the vocabulary.

        Raises:
            None.
        """
        return len(self.encoder)

    def get_vocab(self):
        """
        Retrieves the vocabulary of the CpmAntTokenizer instance.

        Args:
            self (CpmAntTokenizer): The instance of CpmAntTokenizer.

        Returns:
            dict: The vocabulary of the tokenizer, which is a dictionary mapping tokens to their corresponding IDs.

        Raises:
            None.

        Example:
            ```python
            >>> tokenizer = CpmAntTokenizer()
            >>> vocab = tokenizer.get_vocab()
            >>> vocab
            {'<pad>': 0, '<unk>': 1, '<s>': 2, '</s>': 3, ...}
            ```
        """
        return dict(self.encoder, **self.added_tokens_encoder)

    def _tokenize(self, text):
        """Tokenize a string."""
        output_tokens = []
        for x in jieba.cut(text, cut_all=False):
            output_tokens.extend(self.wordpiece_tokenizer.tokenize(x))
        return output_tokens

    def _decode(self, token_ids, **kwargs):
        """Decode ids into a string."""
        token_ids = [i for i in token_ids if i >= 0]
        token_ids = [
            x for x in token_ids if x not in (self.pad_token_id, self.eos_token_id, self.bos_token_id)
        ]
        return super()._decode(token_ids, **kwargs)

    def check(self, token):
        """
        Check if a token is present in the encoder of the CpmAntTokenizer.

        Args:
            self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
            token (Any): The token to be checked.

        Returns:
            None.

        Raises:
            None.
        """
        return token in self.encoder

    def convert_tokens_to_string(self, tokens: List[str]) -> str:
        """
        Converts a list of tokens into a string representation.

        Args:
            self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
            tokens (List[str]): A list of tokens to be converted into a string representation.

        Returns:
            str: A string representation of the tokens.

        Raises:
            None.

        Note:
            - The tokens should be provided as a list of strings.
            - The method will join the tokens together using an empty string as a separator.

        Example:
            ```python
            >>> tokenizer = CpmAntTokenizer()
            >>> tokens = ['Hello', 'world', '!']
            >>> tokenizer.convert_tokens_to_string(tokens)
            'Hello world!'
            ```
        """
        return "".join(tokens)

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.encoder.get(token, self.encoder.get(self.unk_token))

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.decoder.get(index, self.unk_token)

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary to a file with the specified directory and filename prefix.

        Args:
            self: Instance of the CpmAntTokenizer class.
            save_directory (str): The directory where the vocabulary file will be saved.
            filename_prefix (Optional[str]): A string to be prefixed to the filename. Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the path to the saved vocabulary file.

        Raises:
            None.
        """
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
            )
        else:
            vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
        index = 0
        if " " in self.encoder:
            self.encoder["</_>"] = self.encoder[" "]
            del self.encoder[" "]
        if "\n" in self.encoder:
            self.encoder["</n>"] = self.encoder["\n"]
            del self.encoder["\n"]
        self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
        with open(vocab_file, "w", encoding="utf-8") as writer:
            for token, token_index in self.encoder.items():
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                        " Please check that the vocabulary is not corrupted!"
                    )
                    index = token_index
                writer.write(token + "\n")
                index += 1
        return (vocab_file,)

    def build_inputs_with_special_tokens(self, token_ids_0: List[int], token_ids_1: List[int] = None) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A CPMAnt sequence has the following format:

        - single sequence: `[BOS] Sequence`.

        Args:
            token_ids_0 (`List[int]`): The first tokenized sequence that special tokens will be added.
            token_ids_1 (`List[int]`): The optional second tokenized sequence that special tokens will be added.

        Returns:
            `List[int]`: The model input with special tokens.
        """
        if token_ids_1 is None:
            return [self.bos_token_id] + token_ids_0
        return [self.bos_token_id] + token_ids_0 + [self.bos_token_id] + token_ids_1

    def get_special_tokens_mask(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`): List of IDs.
            token_ids_1 (`List[int]`, *optional*): Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
            )

        if token_ids_1 is not None:
            return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1))
        return [1] + ([0] * len(token_ids_0))

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.bod_token_id property

This method, 'bod_token_id', is a property method defined in the 'CpmAntTokenizer' class. It takes no external parameters and returns the token ID associated with the 'bod_token'.

PARAMETER DESCRIPTION
self

The instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

RETURNS DESCRIPTION

None.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.eod_token_id property

This method 'eod_token_id' in the class 'CpmAntTokenizer' retrieves the token ID of the end-of-document token.

PARAMETER DESCRIPTION
self

An instance of the class CpmAntTokenizer. It is required as this method is part of the class and needs access to its attributes and methods.

RETURNS DESCRIPTION
None

This method returns a value of type None. It retrieves the token ID of the end-of-document token from the encoder attribute of the class instance.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.newline_id property

This method, newline_id, in the class CpmAntTokenizer, returns the value associated with the newline character in the encoder.

PARAMETER DESCRIPTION
self

The instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
KeyError

If the newline character '\n' is not found in the encoder dictionary, a KeyError is raised.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.vocab_size: int property

Returns the size of the vocabulary used by the CpmAntTokenizer instance.

PARAMETER DESCRIPTION
self

The CpmAntTokenizer instance itself.

RETURNS DESCRIPTION
int

The number of unique tokens in the vocabulary.

TYPE: int

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.__init__(vocab_file, bod_token='<d>', eod_token='</d>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', unk_token='<unk>', line_token='</n>', space_token='</_>', padding_side='left', **kwargs)

Initialize a CpmAntTokenizer object with the provided parameters.

PARAMETER DESCRIPTION
vocab_file

The path to the vocabulary file to load.

TYPE: str

bod_token

Beginning of document token (default is '').

TYPE: str DEFAULT: '<d>'

eod_token

End of document token (default is '').

TYPE: str DEFAULT: '</d>'

bos_token

Beginning of sentence token (default is '').

TYPE: str DEFAULT: '<s>'

eos_token

End of sentence token (default is '').

TYPE: str DEFAULT: '</s>'

pad_token

Padding token (default is '').

TYPE: str DEFAULT: '<pad>'

unk_token

Token for unknown words (default is '').

TYPE: str DEFAULT: '<unk>'

line_token

Line break token (default is '').

TYPE: str DEFAULT: '</n>'

space_token

Space token (default is '</_>').

TYPE: str DEFAULT: '</_>'

padding_side

Side for padding (default is 'left').

TYPE: str DEFAULT: 'left'

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
MissingBackendError

If required backend 'jieba' is not available.

FileNotFoundError

If the specified 'vocab_file' does not exist.

KeyError

If 'space_token' or 'line_token' are missing in the loaded vocabulary.

Exception

Any other unforeseen error that may occur during initialization.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def __init__(
    self,
    vocab_file,
    bod_token="<d>",
    eod_token="</d>",
    bos_token="<s>",
    eos_token="</s>",
    pad_token="<pad>",
    unk_token="<unk>",
    line_token="</n>",
    space_token="</_>",
    padding_side="left",
    **kwargs,
):
    """
    Initialize a CpmAntTokenizer object with the provided parameters.

    Args:
        vocab_file (str): The path to the vocabulary file to load.
        bod_token (str, optional): Beginning of document token (default is '<d>').
        eod_token (str, optional): End of document token (default is '</d>').
        bos_token (str, optional): Beginning of sentence token (default is '<s>').
        eos_token (str, optional): End of sentence token (default is '</s>').
        pad_token (str, optional): Padding token (default is '<pad>').
        unk_token (str, optional): Token for unknown words (default is '<unk>').
        line_token (str, optional): Line break token (default is '</n>').
        space_token (str, optional): Space token (default is '</_>').
        padding_side (str, optional): Side for padding (default is 'left').

    Returns:
        None.

    Raises:
        MissingBackendError: If required backend 'jieba' is not available.
        FileNotFoundError: If the specified 'vocab_file' does not exist.
        KeyError: If 'space_token' or 'line_token' are missing in the loaded vocabulary.
        Exception: Any other unforeseen error that may occur during initialization.
    """
    requires_backends(self, ["jieba"])
    self.bod_token = bod_token
    self.eod_token = eod_token
    self.encoder = load_vocab(vocab_file)
    self.encoder[" "] = self.encoder[space_token]
    self.encoder["\n"] = self.encoder[line_token]

    del self.encoder[space_token]
    del self.encoder[line_token]

    self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
    self.decoder = {v: k for k, v in self.encoder.items()}

    self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.encoder, unk_token=unk_token)

    super().__init__(
        bod_token=bod_token,
        eod_token=eod_token,
        bos_token=bos_token,
        eos_token=eos_token,
        pad_token=pad_token,
        unk_token=unk_token,
        line_token=line_token,
        space_token=space_token,
        padding_side=padding_side,
        **kwargs,
    )

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A CPMAnt sequence has the following format:

  • single sequence: [BOS] Sequence.
PARAMETER DESCRIPTION
token_ids_0

The first tokenized sequence that special tokens will be added.

TYPE: `List[int]`

token_ids_1

The optional second tokenized sequence that special tokens will be added.

TYPE: `List[int]` DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: The model input with special tokens.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
def build_inputs_with_special_tokens(self, token_ids_0: List[int], token_ids_1: List[int] = None) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. A CPMAnt sequence has the following format:

    - single sequence: `[BOS] Sequence`.

    Args:
        token_ids_0 (`List[int]`): The first tokenized sequence that special tokens will be added.
        token_ids_1 (`List[int]`): The optional second tokenized sequence that special tokens will be added.

    Returns:
        `List[int]`: The model input with special tokens.
    """
    if token_ids_1 is None:
        return [self.bos_token_id] + token_ids_0
    return [self.bos_token_id] + token_ids_0 + [self.bos_token_id] + token_ids_1

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.check(token)

Check if a token is present in the encoder of the CpmAntTokenizer.

PARAMETER DESCRIPTION
self

An instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

token

The token to be checked.

TYPE: Any

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
def check(self, token):
    """
    Check if a token is present in the encoder of the CpmAntTokenizer.

    Args:
        self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
        token (Any): The token to be checked.

    Returns:
        None.

    Raises:
        None.
    """
    return token in self.encoder

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.convert_tokens_to_string(tokens)

Converts a list of tokens into a string representation.

PARAMETER DESCRIPTION
self

An instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

tokens

A list of tokens to be converted into a string representation.

TYPE: List[str]

RETURNS DESCRIPTION
str

A string representation of the tokens.

TYPE: str

Note
  • The tokens should be provided as a list of strings.
  • The method will join the tokens together using an empty string as a separator.
Example
>>> tokenizer = CpmAntTokenizer()
>>> tokens = ['Hello', 'world', '!']
>>> tokenizer.convert_tokens_to_string(tokens)
'Hello world!'
Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
def convert_tokens_to_string(self, tokens: List[str]) -> str:
    """
    Converts a list of tokens into a string representation.

    Args:
        self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
        tokens (List[str]): A list of tokens to be converted into a string representation.

    Returns:
        str: A string representation of the tokens.

    Raises:
        None.

    Note:
        - The tokens should be provided as a list of strings.
        - The method will join the tokens together using an empty string as a separator.

    Example:
        ```python
        >>> tokenizer = CpmAntTokenizer()
        >>> tokens = ['Hello', 'world', '!']
        >>> tokenizer.convert_tokens_to_string(tokens)
        'Hello world!'
        ```
    """
    return "".join(tokens)

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

PARAMETER DESCRIPTION
token_ids_0

List of IDs.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

already_has_special_tokens

Whether or not the token list is already formatted with special tokens for the model.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

RETURNS DESCRIPTION
List[int]

List[int]: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
def get_special_tokens_mask(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
) -> List[int]:
    """
    Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
    special tokens using the tokenizer `prepare_for_model` method.

    Args:
        token_ids_0 (`List[int]`): List of IDs.
        token_ids_1 (`List[int]`, *optional*): Optional second list of IDs for sequence pairs.
        already_has_special_tokens (`bool`, *optional*, defaults to `False`):
            Whether or not the token list is already formatted with special tokens for the model.

    Returns:
        `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
    """
    if already_has_special_tokens:
        return super().get_special_tokens_mask(
            token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
        )

    if token_ids_1 is not None:
        return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1))
    return [1] + ([0] * len(token_ids_0))

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.get_vocab()

Retrieves the vocabulary of the CpmAntTokenizer instance.

PARAMETER DESCRIPTION
self

The instance of CpmAntTokenizer.

TYPE: CpmAntTokenizer

RETURNS DESCRIPTION
dict

The vocabulary of the tokenizer, which is a dictionary mapping tokens to their corresponding IDs.

Example
>>> tokenizer = CpmAntTokenizer()
>>> vocab = tokenizer.get_vocab()
>>> vocab
{'<pad>': 0, '<unk>': 1, '<s>': 2, '</s>': 3, ...}
Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
def get_vocab(self):
    """
    Retrieves the vocabulary of the CpmAntTokenizer instance.

    Args:
        self (CpmAntTokenizer): The instance of CpmAntTokenizer.

    Returns:
        dict: The vocabulary of the tokenizer, which is a dictionary mapping tokens to their corresponding IDs.

    Raises:
        None.

    Example:
        ```python
        >>> tokenizer = CpmAntTokenizer()
        >>> vocab = tokenizer.get_vocab()
        >>> vocab
        {'<pad>': 0, '<unk>': 1, '<s>': 2, '</s>': 3, ...}
        ```
    """
    return dict(self.encoder, **self.added_tokens_encoder)

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary to a file with the specified directory and filename prefix.

PARAMETER DESCRIPTION
self

Instance of the CpmAntTokenizer class.

save_directory

The directory where the vocabulary file will be saved.

TYPE: str

filename_prefix

A string to be prefixed to the filename. Defaults to None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the path to the saved vocabulary file.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary to a file with the specified directory and filename prefix.

    Args:
        self: Instance of the CpmAntTokenizer class.
        save_directory (str): The directory where the vocabulary file will be saved.
        filename_prefix (Optional[str]): A string to be prefixed to the filename. Defaults to None.

    Returns:
        Tuple[str]: A tuple containing the path to the saved vocabulary file.

    Raises:
        None.
    """
    if os.path.isdir(save_directory):
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
    else:
        vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
    index = 0
    if " " in self.encoder:
        self.encoder["</_>"] = self.encoder[" "]
        del self.encoder[" "]
    if "\n" in self.encoder:
        self.encoder["</n>"] = self.encoder["\n"]
        del self.encoder["\n"]
    self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
    with open(vocab_file, "w", encoding="utf-8") as writer:
        for token, token_index in self.encoder.items():
            if index != token_index:
                logger.warning(
                    f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                    " Please check that the vocabulary is not corrupted!"
                )
                index = token_index
            writer.write(token + "\n")
            index += 1
    return (vocab_file,)

mindnlp.transformers.models.cpmant.tokenization_cpmant.WordpieceTokenizer

The WordpieceTokenizer class represents a tokenizer that tokenizes input text into subword tokens using the WordPiece algorithm.

ATTRIBUTE DESCRIPTION
vocab

A dictionary containing the vocabulary of subword tokens.

TYPE: dict

unk_token

The token to be used for out-of-vocabulary or unknown words.

TYPE: str

max_input_chars_per_word

The maximum number of input characters per word for tokenization.

TYPE: int

METHOD DESCRIPTION
tokenize

Tokenizes the input token into subword tokens using the WordPiece algorithm and the specified vocabulary.

Example
>>> vocab = {'hello': 'he', 'world': 'wo', 'hello,': 'hello'}
>>> tokenizer = WordpieceTokenizer(vocab, '<unk>', 200)
>>> tokenized_text = tokenizer.tokenize('helloworld')
Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class WordpieceTokenizer:

    """
    The WordpieceTokenizer class represents a tokenizer that tokenizes input text into subword tokens using the WordPiece algorithm.

    Attributes:
        vocab (dict): A dictionary containing the vocabulary of subword tokens.
        unk_token (str): The token to be used for out-of-vocabulary or unknown words.
        max_input_chars_per_word (int): The maximum number of input characters per word for tokenization.

    Methods:
        tokenize(token):
            Tokenizes the input token into subword tokens using the WordPiece algorithm and the specified vocabulary.

    Example:
        ```python
        >>> vocab = {'hello': 'he', 'world': 'wo', 'hello,': 'hello'}
        >>> tokenizer = WordpieceTokenizer(vocab, '<unk>', 200)
        >>> tokenized_text = tokenizer.tokenize('helloworld')
        ```
    """
    def __init__(self, vocab, unk_token="<unk>", max_input_chars_per_word=200):
        """
        Initializes a new instance of the WordpieceTokenizer class.

        Args:
            self (WordpieceTokenizer): The current instance of the WordpieceTokenizer class.
            vocab (list): A list of strings representing the vocabulary for the tokenizer.
            unk_token (str, optional): The token to use for unknown words. Defaults to '<unk>'.
            max_input_chars_per_word (int, optional): The maximum number of characters allowed per word. Defaults to 200.

        Returns:
            None

        Raises:
            None.

        This method initializes the WordpieceTokenizer object with the provided vocabulary, unknown token, and maximum input characters per word.
        The vocabulary is a list of strings that represents the set of tokens used by the tokenizer.
        The unk_token parameter allows customization of the token used to represent unknown words. If not provided, it defaults to '<unk>'.
        The max_input_chars_per_word parameter limits the number of characters allowed per word.
        If a word exceeds this limit, it will be split into subwords.

        Example:
            ```python
            >>> tokenizer = WordpieceTokenizer(vocab=['hello', 'world'], unk_token='<unk>', max_input_chars_per_word=200)
            ```
        """
        self.vocab = vocab
        self.unk_token = unk_token
        self.max_input_chars_per_word = max_input_chars_per_word

    def tokenize(self, token):
        """
        This method tokenizes a given input token into sub-tokens based on the vocabulary of the WordpieceTokenizer class.

        Args:
            self (WordpieceTokenizer): The instance of the WordpieceTokenizer class.
                It is used to access the vocabulary and maximum input characters per word.
            token (str): The input token to be tokenized.
                It represents the word to be broken down into sub-tokens.
                Must be a string.

        Returns:
            list: A list of sub-tokens generated from the input token based on the vocabulary.
                If the length of the input token exceeds the maximum allowed characters per word,
                it returns a list containing the unknown token (unk_token).
                Otherwise, it returns a list of sub-tokens that are part of the vocabulary or the unknown token.

        Raises:
            None
        """
        chars = list(token)
        if len(chars) > self.max_input_chars_per_word:
            return [self.unk_token]

        start = 0
        sub_tokens = []
        while start < len(chars):
            end = len(chars)
            cur_substr = None
            while start < end:
                substr = "".join(chars[start:end])
                if substr in self.vocab:
                    cur_substr = substr
                    break
                end -= 1
            if cur_substr is None:
                sub_tokens.append(self.unk_token)
                start += 1
            else:
                sub_tokens.append(cur_substr)
                start = end

        return sub_tokens

mindnlp.transformers.models.cpmant.tokenization_cpmant.WordpieceTokenizer.__init__(vocab, unk_token='<unk>', max_input_chars_per_word=200)

Initializes a new instance of the WordpieceTokenizer class.

PARAMETER DESCRIPTION
self

The current instance of the WordpieceTokenizer class.

TYPE: WordpieceTokenizer

vocab

A list of strings representing the vocabulary for the tokenizer.

TYPE: list

unk_token

The token to use for unknown words. Defaults to ''.

TYPE: str DEFAULT: '<unk>'

max_input_chars_per_word

The maximum number of characters allowed per word. Defaults to 200.

TYPE: int DEFAULT: 200

RETURNS DESCRIPTION

None

This method initializes the WordpieceTokenizer object with the provided vocabulary, unknown token, and maximum input characters per word. The vocabulary is a list of strings that represents the set of tokens used by the tokenizer. The unk_token parameter allows customization of the token used to represent unknown words. If not provided, it defaults to ''. The max_input_chars_per_word parameter limits the number of characters allowed per word. If a word exceeds this limit, it will be split into subwords.

Example
>>> tokenizer = WordpieceTokenizer(vocab=['hello', 'world'], unk_token='<unk>', max_input_chars_per_word=200)
Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def __init__(self, vocab, unk_token="<unk>", max_input_chars_per_word=200):
    """
    Initializes a new instance of the WordpieceTokenizer class.

    Args:
        self (WordpieceTokenizer): The current instance of the WordpieceTokenizer class.
        vocab (list): A list of strings representing the vocabulary for the tokenizer.
        unk_token (str, optional): The token to use for unknown words. Defaults to '<unk>'.
        max_input_chars_per_word (int, optional): The maximum number of characters allowed per word. Defaults to 200.

    Returns:
        None

    Raises:
        None.

    This method initializes the WordpieceTokenizer object with the provided vocabulary, unknown token, and maximum input characters per word.
    The vocabulary is a list of strings that represents the set of tokens used by the tokenizer.
    The unk_token parameter allows customization of the token used to represent unknown words. If not provided, it defaults to '<unk>'.
    The max_input_chars_per_word parameter limits the number of characters allowed per word.
    If a word exceeds this limit, it will be split into subwords.

    Example:
        ```python
        >>> tokenizer = WordpieceTokenizer(vocab=['hello', 'world'], unk_token='<unk>', max_input_chars_per_word=200)
        ```
    """
    self.vocab = vocab
    self.unk_token = unk_token
    self.max_input_chars_per_word = max_input_chars_per_word

mindnlp.transformers.models.cpmant.tokenization_cpmant.WordpieceTokenizer.tokenize(token)

This method tokenizes a given input token into sub-tokens based on the vocabulary of the WordpieceTokenizer class.

PARAMETER DESCRIPTION
self

The instance of the WordpieceTokenizer class. It is used to access the vocabulary and maximum input characters per word.

TYPE: WordpieceTokenizer

token

The input token to be tokenized. It represents the word to be broken down into sub-tokens. Must be a string.

TYPE: str

RETURNS DESCRIPTION
list

A list of sub-tokens generated from the input token based on the vocabulary. If the length of the input token exceeds the maximum allowed characters per word, it returns a list containing the unknown token (unk_token). Otherwise, it returns a list of sub-tokens that are part of the vocabulary or the unknown token.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def tokenize(self, token):
    """
    This method tokenizes a given input token into sub-tokens based on the vocabulary of the WordpieceTokenizer class.

    Args:
        self (WordpieceTokenizer): The instance of the WordpieceTokenizer class.
            It is used to access the vocabulary and maximum input characters per word.
        token (str): The input token to be tokenized.
            It represents the word to be broken down into sub-tokens.
            Must be a string.

    Returns:
        list: A list of sub-tokens generated from the input token based on the vocabulary.
            If the length of the input token exceeds the maximum allowed characters per word,
            it returns a list containing the unknown token (unk_token).
            Otherwise, it returns a list of sub-tokens that are part of the vocabulary or the unknown token.

    Raises:
        None
    """
    chars = list(token)
    if len(chars) > self.max_input_chars_per_word:
        return [self.unk_token]

    start = 0
    sub_tokens = []
    while start < len(chars):
        end = len(chars)
        cur_substr = None
        while start < end:
            substr = "".join(chars[start:end])
            if substr in self.vocab:
                cur_substr = substr
                break
            end -= 1
        if cur_substr is None:
            sub_tokens.append(self.unk_token)
            start += 1
        else:
            sub_tokens.append(cur_substr)
            start = end

    return sub_tokens

mindnlp.transformers.models.cpmant.tokenization_cpmant.load_vocab(vocab_file)

Loads a vocabulary file into a dictionary.

Source code in mindnlp\transformers\models\cpmant\tokenization_cpmant.py
45
46
47
48
49
50
51
52
53
def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
    with open(vocab_file, "r", encoding="utf-8") as reader:
        tokens = reader.readlines()
    for index, token in enumerate(tokens):
        token = token.rstrip("\n")
        vocab[token] = index
    return vocab

mindnlp.transformers.models.cpmant.modeling_cpmant

MindSpore CPMAnt

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntAttention

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
class CpmAntAttention(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.dim_model = config.hidden_size
        self.num_heads = config.num_attention_heads
        self.dim_head = config.dim_head

        self.project_q = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
        self.project_k = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
        self.project_v = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)

        self.attention_out = nn.Linear(self.num_heads * self.dim_head, self.dim_model, bias=False)

        self.softmax = nn.Softmax(dim=-1)

        if config.dropout_p is not None:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_q: mindspore.Tensor,
        hidden_kv: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: mindspore.Tensor,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_q (`mindspore.Tensor`):
                Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
            hidden_kv (`mindspore.Tensor` of shape `(batch, len_k, dim_model)`)):
                Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
            attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Avoid invalid areas to participate in the calculation of self-attention.
            position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Provide positional information to self-attention block.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional*):
                Cached past key and value projection states.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        batch_size = hidden_q.shape[0]
        len_q = hidden_q.shape[1]
        len_k = hidden_kv.shape[1]

        query = self.project_q(hidden_q)
        key = self.project_k(hidden_kv)
        value = self.project_v(hidden_kv)

        query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
        key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
        value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)

        if past_key_values is not None:
            key = ops.cat([past_key_values[0], key], dim=-2)
            value = ops.cat([past_key_values[1], value], dim=-2)
            len_k = key.shape[-2]

        # (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
        score = ops.matmul(query, ops.transpose(key, -1, -2)) / math.sqrt(self.dim_head)
        score = score + position_bias

        score = ops.masked_fill(
            score,
            attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
            float(ops.finfo(score.dtype).min)
        )
        score = self.softmax(score)

        score = ops.masked_fill(
            score,
            attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
            0.
        )
        if output_attentions:
            attn_weights = score
        else:
            attn_weights = None

        if self.dropout is not None:
            score = self.dropout(score)

        # (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
        score = ops.matmul(score, value)

        score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
        score = score.view(batch_size, len_q, self.num_heads * self.dim_head)

        score = self.attention_out(score)

        past_key_values = None
        if use_cache:
            past_key_values = (key, value)

        return score, attn_weights, past_key_values

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntAttention.forward(hidden_q, hidden_kv, attention_mask, position_bias, output_attentions=False, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_q

Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.

TYPE: `mindspore.Tensor`

hidden_kv

Tensor key_value and query of shape (batch, len_k, dim_model)

TYPE: `mindspore.Tensor` of shape `(batch, len_k, dim_model)`

attention_mask

Avoid invalid areas to participate in the calculation of self-attention.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`

position_bias

Provide positional information to self-attention block.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: False

past_key_values

Cached past key and value projection states.

TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
def forward(
    self,
    hidden_q: mindspore.Tensor,
    hidden_kv: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: mindspore.Tensor,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_q (`mindspore.Tensor`):
            Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
        hidden_kv (`mindspore.Tensor` of shape `(batch, len_k, dim_model)`)):
            Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
        attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Avoid invalid areas to participate in the calculation of self-attention.
        position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Provide positional information to self-attention block.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional*):
            Cached past key and value projection states.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    batch_size = hidden_q.shape[0]
    len_q = hidden_q.shape[1]
    len_k = hidden_kv.shape[1]

    query = self.project_q(hidden_q)
    key = self.project_k(hidden_kv)
    value = self.project_v(hidden_kv)

    query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
    key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
    value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)

    if past_key_values is not None:
        key = ops.cat([past_key_values[0], key], dim=-2)
        value = ops.cat([past_key_values[1], value], dim=-2)
        len_k = key.shape[-2]

    # (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
    score = ops.matmul(query, ops.transpose(key, -1, -2)) / math.sqrt(self.dim_head)
    score = score + position_bias

    score = ops.masked_fill(
        score,
        attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
        float(ops.finfo(score.dtype).min)
    )
    score = self.softmax(score)

    score = ops.masked_fill(
        score,
        attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
        0.
    )
    if output_attentions:
        attn_weights = score
    else:
        attn_weights = None

    if self.dropout is not None:
        score = self.dropout(score)

    # (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
    score = ops.matmul(score, value)

    score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
    score = score.view(batch_size, len_q, self.num_heads * self.dim_head)

    score = self.attention_out(score)

    past_key_values = None
    if use_cache:
        past_key_values = (key, value)

    return score, attn_weights, past_key_values

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntDenseGatedACT

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
class CpmAntDenseGatedACT(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.w_0 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
        self.w_1 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
        self.act = nn.GELU()

    def forward(self, hidden_states: mindspore.Tensor):
        """Transform an input tensor from one feature space to another via a nonlinear operation

        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        gate_score = self.act(self.w_0(hidden_states))
        hidden_states = self.w_1(hidden_states)

        hidden_states = gate_score * hidden_states
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntDenseGatedACT.forward(hidden_states)

Transform an input tensor from one feature space to another via a nonlinear operation

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
223
224
225
226
227
228
229
230
231
232
233
def forward(self, hidden_states: mindspore.Tensor):
    """Transform an input tensor from one feature space to another via a nonlinear operation

    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    gate_score = self.act(self.w_0(hidden_states))
    hidden_states = self.w_1(hidden_states)

    hidden_states = gate_score * hidden_states
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntEncoder

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
class CpmAntEncoder(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.num_layers = config.num_hidden_layers
        self.layers = nn.ModuleList([CpmAntTransformerBlock(config) for ith in range(self.num_layers)])

        self.output_layernorm = CpmAntLayerNorm(config)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: mindspore.Tensor,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor`):
                Input to the layer of shape `(batch, seq_len, dim_model)`
            attention_mask (`mindspore.Tensor`):
                Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
            position_bias (`mindspore.Tensor`):
                Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
                Cached past key and value projection states
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        all_hidden_states = () if output_hidden_states else None
        all_self_attns = () if output_attentions else None
        current_key_values = () if use_cache else None

        for i, layer in enumerate(self.layers):
            if output_hidden_states:
                all_hidden_states += (hidden_states,)
            layer_outputs = layer(
                hidden_states,
                attention_mask,
                position_bias,
                output_attentions=output_attentions,
                past_key_values=past_key_values[i] if past_key_values else None,
                use_cache=use_cache,
            )
            hidden_states, attn_weights, current_key_value = layer_outputs
            if output_attentions:
                all_self_attns += (attn_weights,)
            if current_key_value is not None:
                current_key_values = current_key_values + (current_key_value,)

        hidden_states = self.output_layernorm(hidden_states)

        if output_hidden_states:
            all_hidden_states += (hidden_states,)

        return hidden_states, current_key_values, all_hidden_states, all_self_attns

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntEncoder.forward(hidden_states, attention_mask, position_bias, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_states

Input to the layer of shape (batch, seq_len, dim_model)

TYPE: `mindspore.Tensor`

attention_mask

Avoid invalid areas to participate in the calculation of shape (batch, seq_len, seq_len)

TYPE: `mindspore.Tensor`

position_bias

Provides position information to attention mechanism of shape (num_heads, seq_len, seq_len)

TYPE: `mindspore.Tensor`

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers.

TYPE: `bool`, *optional* DEFAULT: None

past_key_values

Cached past key and value projection states

TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: mindspore.Tensor,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor`):
            Input to the layer of shape `(batch, seq_len, dim_model)`
        attention_mask (`mindspore.Tensor`):
            Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
        position_bias (`mindspore.Tensor`):
            Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
            Cached past key and value projection states
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    all_hidden_states = () if output_hidden_states else None
    all_self_attns = () if output_attentions else None
    current_key_values = () if use_cache else None

    for i, layer in enumerate(self.layers):
        if output_hidden_states:
            all_hidden_states += (hidden_states,)
        layer_outputs = layer(
            hidden_states,
            attention_mask,
            position_bias,
            output_attentions=output_attentions,
            past_key_values=past_key_values[i] if past_key_values else None,
            use_cache=use_cache,
        )
        hidden_states, attn_weights, current_key_value = layer_outputs
        if output_attentions:
            all_self_attns += (attn_weights,)
        if current_key_value is not None:
            current_key_values = current_key_values + (current_key_value,)

    hidden_states = self.output_layernorm(hidden_states)

    if output_hidden_states:
        all_hidden_states += (hidden_states,)

    return hidden_states, current_key_values, all_hidden_states, all_self_attns

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFFNBlock

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
class CpmAntFFNBlock(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.layernorm_before_ffn = CpmAntLayerNorm(config)
        self.ffn = CpmAntFeedForward(config)
        if config.dropout_p:
            self.dropout = nn.Dropout(config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_states: mindspore.Tensor,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
                Hidden states before feed forward layer.
        """
        ln_outputs = self.layernorm_before_ffn(hidden_states)
        outputs = self.ffn(ln_outputs)
        if self.dropout is not None:
            outputs = self.dropout(outputs)
        hidden_states = hidden_states + outputs
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFFNBlock.forward(hidden_states)

PARAMETER DESCRIPTION
hidden_states

Hidden states before feed forward layer.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, dim_model)`

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
def forward(
    self,
    hidden_states: mindspore.Tensor,
):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
            Hidden states before feed forward layer.
    """
    ln_outputs = self.layernorm_before_ffn(hidden_states)
    outputs = self.ffn(ln_outputs)
    if self.dropout is not None:
        outputs = self.dropout(outputs)
    hidden_states = hidden_states + outputs
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFeedForward

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
class CpmAntFeedForward(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.w_in = CpmAntDenseGatedACT(config)
        if config.dropout_p is not None:
            self.dropout = nn.Dropout(config.dropout_p)
        else:
            self.dropout = None

        self.w_out = nn.Linear(config.dim_ff, config.hidden_size, bias=False)

    def forward(self, hidden_states: mindspore.Tensor):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        hidden_states = self.w_in(hidden_states)

        if self.dropout is not None:
            hidden_states = self.dropout(hidden_states)

        hidden_states = self.w_out(hidden_states)

        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFeedForward.forward(hidden_states)

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
247
248
249
250
251
252
253
254
255
256
257
258
259
def forward(self, hidden_states: mindspore.Tensor):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    hidden_states = self.w_in(hidden_states)

    if self.dropout is not None:
        hidden_states = self.dropout(hidden_states)

    hidden_states = self.w_out(hidden_states)

    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM

Bases: CpmAntPreTrainedModel

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
class CpmAntForCausalLM(CpmAntPreTrainedModel):
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config: CpmAntConfig):
        super().__init__(config)
        self.cpmant = CpmAntModel(config)

        # lm_head.weight is tied to cpmant.input_embedding.weight
        self.lm_head = nn.Linear(
            config.hidden_size, config.vocab_size + config.prompt_types * config.prompt_length, bias=False
        )
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[Tuple[mindspore.Tensor, mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        labels: Optional[mindspore.Tensor] = None,
        return_dict: Optional[bool] = None,
        attention_mask: Optional[mindspore.Tensor] = None,  # dummy parameter for text-generation pipeline
        **kwargs,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Indices of input sequence tokens in the vocabulary.

                Indices can be obtained using [`CPMAntTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
                Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
                cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
            attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                CPMAnt will process attention mask automatically, this parameter is a dummy parameter for
                text-generation pipeline.

        Example:

        Text Generation with CpmAntForCausalLM.
        ```python
        >>> from transformers import CPMAntTokenizer, CpmAntForCausalLM

        >>> texts = "今天天气不错,"
        >>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
        >>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
        >>> input_ids = tokenizer(texts, return_tensors="pt")
        >>> outputs = model.generate(**input_ids)
        >>> output_texts = tokenizer.batch_decode(outputs)
        >>> print(output_texts)
        ['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']
        ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        model_output = self.cpmant(
            input_ids, output_attentions, output_hidden_states, past_key_values, use_cache, return_dict
        )
        hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

        logits = self.lm_head(hidden_states)

        loss = None
        if labels is not None:
            loss_func = CrossEntropyLoss()
            loss = loss_func(logits.view(-1, logits.shape[-1]), labels.view(-1))

        if not return_dict:
            output = (logits,) + model_output[1:]
            return ((loss,) + output) if loss is not None else output

        return CausalLMOutputWithPast(
            loss=loss,
            logits=logits,
            past_key_values=model_output.past_key_values,
            hidden_states=model_output.hidden_states,
            attentions=model_output.attentions,
        )

    def get_input_embeddings(self):
        return self.cpmant.input_embedding

    def set_input_embeddings(self, embeddings):
        self.cpmant.input_embedding = embeddings

    def get_output_embeddings(self):
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        self.lm_head = new_embeddings

    def prepare_inputs_for_generation(self, input_ids, **kwargs):
        input_ids = input_ids.int()
        # save the memory usage of dummy attention mask
        if "attention_mask" in kwargs:
            kwargs["attention_mask"] = ops.zeros(1, 1)

        return {
            "input_ids": input_ids,
            "use_cache": kwargs["use_cache"],
            "past_key_values": kwargs.get("past_key_values", None),
        }

    def _reorder_cache(self, past_key_values, beam_idx):
        past_key_values = [list(each) if each is not None else each for each in past_key_values]
        for key_value_layer in past_key_values:
            key_value_layer[0] = key_value_layer[0][beam_idx]
            key_value_layer[1] = key_value_layer[1][beam_idx]
        return past_key_values

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.forward(input_ids=None, past_key_values=None, use_cache=None, output_attentions=None, output_hidden_states=None, labels=None, return_dict=None, attention_mask=None, **kwargs)

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using [CPMAntTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

What are input IDs?

TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: None

past_key_values

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

TYPE: `tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True` DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers.

TYPE: `bool`, *optional* DEFAULT: None

labels

Labels for computing the masked language modeling loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

return_dict

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

TYPE: `bool`, *optional* DEFAULT: None

attention_mask

CPMAnt will process attention mask automatically, this parameter is a dummy parameter for text-generation pipeline.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Text Generation with CpmAntForCausalLM.

>>> from transformers import CPMAntTokenizer, CpmAntForCausalLM

>>> texts = "今天天气不错,"
>>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
>>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
>>> input_ids = tokenizer(texts, return_tensors="pt")
>>> outputs = model.generate(**input_ids)
>>> output_texts = tokenizer.batch_decode(outputs)
>>> print(output_texts)
['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[Tuple[mindspore.Tensor, mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    labels: Optional[mindspore.Tensor] = None,
    return_dict: Optional[bool] = None,
    attention_mask: Optional[mindspore.Tensor] = None,  # dummy parameter for text-generation pipeline
    **kwargs,
) -> Union[Tuple, CausalLMOutputWithPast]:
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using [`CPMAntTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
            Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
            cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
        attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            CPMAnt will process attention mask automatically, this parameter is a dummy parameter for
            text-generation pipeline.

    Example:

    Text Generation with CpmAntForCausalLM.
    ```python
    >>> from transformers import CPMAntTokenizer, CpmAntForCausalLM

    >>> texts = "今天天气不错,"
    >>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
    >>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
    >>> input_ids = tokenizer(texts, return_tensors="pt")
    >>> outputs = model.generate(**input_ids)
    >>> output_texts = tokenizer.batch_decode(outputs)
    >>> print(output_texts)
    ['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']
    ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    model_output = self.cpmant(
        input_ids, output_attentions, output_hidden_states, past_key_values, use_cache, return_dict
    )
    hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

    logits = self.lm_head(hidden_states)

    loss = None
    if labels is not None:
        loss_func = CrossEntropyLoss()
        loss = loss_func(logits.view(-1, logits.shape[-1]), labels.view(-1))

    if not return_dict:
        output = (logits,) + model_output[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=logits,
        past_key_values=model_output.past_key_values,
        hidden_states=model_output.hidden_states,
        attentions=model_output.attentions,
    )

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntLayerNorm

Bases: Module

We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class CpmAntLayerNorm(nn.Module):
    """
    We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."
    """

    def __init__(self, config: CpmAntConfig):
        super().__init__()

        self.eps = config.eps
        self.dim_norm = config.hidden_size
        self.weight = nn.Parameter(ops.empty(config.hidden_size))

    def forward(self, hidden_states: mindspore.Tensor):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        if hidden_states.shape[-1] != self.dim_norm:
            raise AssertionError("hidden_states.shape[-1] != self.dim_norm")
        old_dtype = hidden_states.dtype
        variance = ops.mean(hidden_states.to(mindspore.float32).pow(2), dim=-1, keepdim=True)
        hidden_states = (hidden_states * ops.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntLayerNorm.forward(hidden_states)

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
50
51
52
53
54
55
56
57
58
59
60
def forward(self, hidden_states: mindspore.Tensor):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    if hidden_states.shape[-1] != self.dim_norm:
        raise AssertionError("hidden_states.shape[-1] != self.dim_norm")
    old_dtype = hidden_states.dtype
    variance = ops.mean(hidden_states.to(mindspore.float32).pow(2), dim=-1, keepdim=True)
    hidden_states = (hidden_states * ops.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
class CpmAntPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = CpmAntConfig
    base_model_prefix = "cpmant"

    def _init_weights(self, module):
        """Initialize the weights"""
        if isinstance(module, nn.Linear):
            nn.init.normal_(module.weight, mean=0.0, std=self.config.init_std)
            if module.bias is not None:
                nn.init.zeros_(module.bias)
        elif isinstance(module, nn.Embedding):
            nn.init.normal_(module.weight, mean=0.0, std=self.config.init_std)
            if module.padding_idx is not None:
                module.weight[module.padding_idx] = 0
        elif isinstance(module, nn.LayerNorm):
            nn.init.zeros_(module.bias)
            nn.init.ones_(module.weight)
        elif isinstance(module, CpmAntLayerNorm):
            nn.init.ones_(module.weight)
        elif isinstance(module, CpmAntSegmentPositionEmbedding):
            nn.init.normal_(module.relative_attention_bias, mean=0.0, std=self.config.init_std)

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSelfAttentionBlock

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
class CpmAntSelfAttentionBlock(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.layernorm_before_attention = CpmAntLayerNorm(config)
        self.self_attention = CpmAntAttention(config)
        if config.dropout_p:
            self.dropout = nn.Dropout(config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
                Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
            attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Avoid invalid areas to participate in the calculation of self-attention.
            position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Provide positional information to self-attention block.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple(mindspore.Tensor)`, *optional*):
                Cached past key and value projection states.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        outputs = self.layernorm_before_attention(hidden_states)
        outputs = self.self_attention(
            outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
        )

        outputs, attn_weights, current_key_value = outputs

        if self.dropout is not None:
            outputs = self.dropout(outputs)
        hidden_states = hidden_states + outputs

        return hidden_states, attn_weights, current_key_value

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSelfAttentionBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_states

Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, dim_model)`

attention_mask

Avoid invalid areas to participate in the calculation of self-attention.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`

position_bias

Provide positional information to self-attention block.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)` DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: False

past_key_values

Cached past key and value projection states.

TYPE: `Tuple(mindspore.Tensor)`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
            Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
        attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Avoid invalid areas to participate in the calculation of self-attention.
        position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Provide positional information to self-attention block.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple(mindspore.Tensor)`, *optional*):
            Cached past key and value projection states.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    outputs = self.layernorm_before_attention(hidden_states)
    outputs = self.self_attention(
        outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
    )

    outputs, attn_weights, current_key_value = outputs

    if self.dropout is not None:
        outputs = self.dropout(outputs)
    hidden_states = hidden_states + outputs

    return hidden_states, attn_weights, current_key_value

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntTransformerBlock

Bases: Module

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
class CpmAntTransformerBlock(nn.Module):
    def __init__(self, config: CpmAntConfig):
        super().__init__()
        self.self_att = CpmAntSelfAttentionBlock(config)
        self.ffn = CpmAntFFNBlock(config)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor`):
                Input to the layer of shape `(batch, seq_len, dim_model)`
            attention_mask (`mindspore.Tensor`):
                Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
            position_bias (`mindspore.Tensor`):
                Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
                Cached past key and value projection states
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        hidden_states = self.self_att(
            hidden_states,
            attention_mask=attention_mask,
            position_bias=position_bias,
            output_attentions=output_attentions,
            past_key_values=past_key_values,
            use_cache=use_cache,
        )

        hidden_states, attn_weights, current_key_value = hidden_states

        hidden_states = self.ffn(hidden_states)

        return hidden_states, attn_weights, current_key_value

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntTransformerBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_states

Input to the layer of shape (batch, seq_len, dim_model)

TYPE: `mindspore.Tensor`

attention_mask

Avoid invalid areas to participate in the calculation of shape (batch, seq_len, seq_len)

TYPE: `mindspore.Tensor`

position_bias

Provides position information to attention mechanism of shape (num_heads, seq_len, seq_len)

TYPE: `mindspore.Tensor` DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: False

past_key_values

Cached past key and value projection states

TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\cpmant\modeling_cpmant.py
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor`):
            Input to the layer of shape `(batch, seq_len, dim_model)`
        attention_mask (`mindspore.Tensor`):
            Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
        position_bias (`mindspore.Tensor`):
            Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
            Cached past key and value projection states
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    hidden_states = self.self_att(
        hidden_states,
        attention_mask=attention_mask,
        position_bias=position_bias,
        output_attentions=output_attentions,
        past_key_values=past_key_values,
        use_cache=use_cache,
    )

    hidden_states, attn_weights, current_key_value = hidden_states

    hidden_states = self.ffn(hidden_states)

    return hidden_states, attn_weights, current_key_value