跳转至

ernie_m

mindnlp.transformers.models.ernie_m.configuration_ernie_m

ErnieM model configuration

mindnlp.transformers.models.ernie_m.configuration_ernie_m.ErnieMConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [ErnieMModel]. It is used to instantiate a Ernie-M model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Ernie-M susnato/ernie-m-base_pytorch architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of inputs_ids in [ErnieMModel]. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [ErnieMModel].

TYPE: `int`, *optional*, defaults to 250002 DEFAULT: 250002

hidden_size

Dimensionality of the embedding layer, encoder layers and pooler layer.

TYPE: `int`, *optional*, defaults to 768 DEFAULT: 768

num_hidden_layers

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

num_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

intermediate_size

Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to feed-forward layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size.

TYPE: `int`, *optional*, defaults to 3072 DEFAULT: 3072

hidden_act

The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other torch supported activation functions are supported.

TYPE: `str`, *optional*, defaults to `"gelu"` DEFAULT: 'gelu'

hidden_dropout_prob

The dropout probability for all fully connected layers in the embeddings and encoder.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_probs_dropout_prob

The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

act_dropout

This dropout probability is used in ErnieMEncoderLayer after activation.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

max_position_embeddings

The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 514

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-05 DEFAULT: 1e-05

classifier_dropout

The dropout ratio for the classification head.

TYPE: `float`, *optional* DEFAULT: None

initializer_range

The standard deviation of the normal initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

pad_token_id(`int`,

The index of padding token in the token vocabulary.

TYPE: *optional*, defaults to 1

A normal_initializer initializes weight matrices as normal distributions. See ErnieMPretrainedModel._init_weights() for how weights are initialized in ErnieMModel.

Source code in mindnlp\transformers\models\ernie_m\configuration_ernie_m.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
class ErnieMConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`ErnieMModel`]. It is used to instantiate a
    Ernie-M model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the `Ernie-M`
    [susnato/ernie-m-base_pytorch](https://hf-mirror.com/susnato/ernie-m-base_pytorch) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 250002):
            Vocabulary size of `inputs_ids` in [`ErnieMModel`]. Also is the vocab size of token embedding matrix.
            Defines the number of different tokens that can be represented by the `inputs_ids` passed when calling
            [`ErnieMModel`].
        hidden_size (`int`, *optional*, defaults to 768):
            Dimensionality of the embedding layer, encoder layers and pooler layer.
        num_hidden_layers (`int`, *optional*, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (`int`, *optional*, defaults to 12):
            Number of attention heads for each attention layer in the Transformer encoder.
        intermediate_size (`int`, *optional*, defaults to 3072):
            Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to feed-forward layers are
            firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically
            intermediate_size is larger than hidden_size.
        hidden_act (`str`, *optional*, defaults to `"gelu"`):
            The non-linear activation function in the feed-forward layer. `"gelu"`, `"relu"` and any other torch
            supported activation functions are supported.
        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings and encoder.
        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout probability used in `MultiHeadAttention` in all encoder layers to drop some attention target.
        act_dropout (`float`, *optional*, defaults to 0.0):
            This dropout probability is used in `ErnieMEncoderLayer` after activation.
        max_position_embeddings (`int`, *optional*, defaults to 512):
            The maximum value of the dimensionality of position encoding, which dictates the maximum supported length
            of an input sequence.
        layer_norm_eps (`float`, *optional*, defaults to 1e-05):
            The epsilon used by the layer normalization layers.
        classifier_dropout (`float`, *optional*):
            The dropout ratio for the classification head.
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the normal initializer for initializing all weight matrices.
        pad_token_id(`int`, *optional*, defaults to 1):
            The index of padding token in the token vocabulary.

    A normal_initializer initializes weight matrices as normal distributions. See
    `ErnieMPretrainedModel._init_weights()` for how weights are initialized in `ErnieMModel`.
    """
    model_type = "ernie_m"
    attribute_map: Dict[str, str] = {"dropout": "classifier_dropout", "num_classes": "num_labels"}

    def __init__(
        self,
        vocab_size: int = 250002,
        hidden_size: int = 768,
        num_hidden_layers: int = 12,
        num_attention_heads: int = 12,
        intermediate_size: int = 3072,
        hidden_act: str = "gelu",
        hidden_dropout_prob: float = 0.1,
        attention_probs_dropout_prob: float = 0.1,
        max_position_embeddings: int = 514,
        initializer_range: float = 0.02,
        pad_token_id: int = 1,
        layer_norm_eps: float = 1e-05,
        classifier_dropout=None,
        is_decoder=False,
        act_dropout=0.0,
        **kwargs,
    ):
        """
        This method initializes an instance of the ErnieMConfig class.

        Args:
            self: The instance of the class.
            vocab_size (int): The size of the vocabulary. Default is 250002.
            hidden_size (int): The size of the hidden layers. Default is 768.
            num_hidden_layers (int): The number of hidden layers. Default is 12.
            num_attention_heads (int): The number of attention heads. Default is 12.
            intermediate_size (int): The size of the intermediate layer in the transformer. Default is 3072.
            hidden_act (str): The activation function for the hidden layers. Default is 'gelu'.
            hidden_dropout_prob (float): The dropout probability for the hidden layers. Default is 0.1.
            attention_probs_dropout_prob (float): The dropout probability for the attention probabilities. Default is 0.1.
            max_position_embeddings (int): The maximum position for the embeddings. Default is 514.
            initializer_range (float): The range for the weight initializers. Default is 0.02.
            pad_token_id (int): The ID for padding tokens. Default is 1.
            layer_norm_eps (float): The epsilon value for layer normalization. Default is 1e-05.
            classifier_dropout (None): The dropout rate for the classifier layer. Default is None.
            is_decoder (bool): Whether the model is a decoder. Default is False.
            act_dropout (float): The dropout rate for the activation function. Default is 0.0.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__(pad_token_id=pad_token_id, **kwargs)
        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.intermediate_size = intermediate_size
        self.hidden_act = hidden_act
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.max_position_embeddings = max_position_embeddings
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.classifier_dropout = classifier_dropout
        self.is_decoder = is_decoder
        self.act_dropout = act_dropout

mindnlp.transformers.models.ernie_m.configuration_ernie_m.ErnieMConfig.__init__(vocab_size=250002, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=514, initializer_range=0.02, pad_token_id=1, layer_norm_eps=1e-05, classifier_dropout=None, is_decoder=False, act_dropout=0.0, **kwargs)

This method initializes an instance of the ErnieMConfig class.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The size of the vocabulary. Default is 250002.

TYPE: int DEFAULT: 250002

hidden_size

The size of the hidden layers. Default is 768.

TYPE: int DEFAULT: 768

num_hidden_layers

The number of hidden layers. Default is 12.

TYPE: int DEFAULT: 12

num_attention_heads

The number of attention heads. Default is 12.

TYPE: int DEFAULT: 12

intermediate_size

The size of the intermediate layer in the transformer. Default is 3072.

TYPE: int DEFAULT: 3072

hidden_act

The activation function for the hidden layers. Default is 'gelu'.

TYPE: str DEFAULT: 'gelu'

hidden_dropout_prob

The dropout probability for the hidden layers. Default is 0.1.

TYPE: float DEFAULT: 0.1

attention_probs_dropout_prob

The dropout probability for the attention probabilities. Default is 0.1.

TYPE: float DEFAULT: 0.1

max_position_embeddings

The maximum position for the embeddings. Default is 514.

TYPE: int DEFAULT: 514

initializer_range

The range for the weight initializers. Default is 0.02.

TYPE: float DEFAULT: 0.02

pad_token_id

The ID for padding tokens. Default is 1.

TYPE: int DEFAULT: 1

layer_norm_eps

The epsilon value for layer normalization. Default is 1e-05.

TYPE: float DEFAULT: 1e-05

classifier_dropout

The dropout rate for the classifier layer. Default is None.

TYPE: None DEFAULT: None

is_decoder

Whether the model is a decoder. Default is False.

TYPE: bool DEFAULT: False

act_dropout

The dropout rate for the activation function. Default is 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\configuration_ernie_m.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def __init__(
    self,
    vocab_size: int = 250002,
    hidden_size: int = 768,
    num_hidden_layers: int = 12,
    num_attention_heads: int = 12,
    intermediate_size: int = 3072,
    hidden_act: str = "gelu",
    hidden_dropout_prob: float = 0.1,
    attention_probs_dropout_prob: float = 0.1,
    max_position_embeddings: int = 514,
    initializer_range: float = 0.02,
    pad_token_id: int = 1,
    layer_norm_eps: float = 1e-05,
    classifier_dropout=None,
    is_decoder=False,
    act_dropout=0.0,
    **kwargs,
):
    """
    This method initializes an instance of the ErnieMConfig class.

    Args:
        self: The instance of the class.
        vocab_size (int): The size of the vocabulary. Default is 250002.
        hidden_size (int): The size of the hidden layers. Default is 768.
        num_hidden_layers (int): The number of hidden layers. Default is 12.
        num_attention_heads (int): The number of attention heads. Default is 12.
        intermediate_size (int): The size of the intermediate layer in the transformer. Default is 3072.
        hidden_act (str): The activation function for the hidden layers. Default is 'gelu'.
        hidden_dropout_prob (float): The dropout probability for the hidden layers. Default is 0.1.
        attention_probs_dropout_prob (float): The dropout probability for the attention probabilities. Default is 0.1.
        max_position_embeddings (int): The maximum position for the embeddings. Default is 514.
        initializer_range (float): The range for the weight initializers. Default is 0.02.
        pad_token_id (int): The ID for padding tokens. Default is 1.
        layer_norm_eps (float): The epsilon value for layer normalization. Default is 1e-05.
        classifier_dropout (None): The dropout rate for the classifier layer. Default is None.
        is_decoder (bool): Whether the model is a decoder. Default is False.
        act_dropout (float): The dropout rate for the activation function. Default is 0.0.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__(pad_token_id=pad_token_id, **kwargs)
    self.vocab_size = vocab_size
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.intermediate_size = intermediate_size
    self.hidden_act = hidden_act
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.max_position_embeddings = max_position_embeddings
    self.initializer_range = initializer_range
    self.layer_norm_eps = layer_norm_eps
    self.classifier_dropout = classifier_dropout
    self.is_decoder = is_decoder
    self.act_dropout = act_dropout

mindnlp.transformers.models.ernie_m.modeling_ernie_m

MindSpore ErnieM model.

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention

Bases: Module

ErnieMAttention is a class that represents an attention mechanism used in the ERNIE-M model. It contains methods for initializing the attention mechanism, pruning attention heads, and forwarding attention outputs. This class inherits from nn.Module and utilizes an ErnieMSelfAttention module for self-attention calculations. The attention mechanism includes projection layers for query, key, and value, as well as an output projection layer. The prune_heads method allows for pruning specific attention heads based on provided indices. The forward method processes input hidden states through the self-attention mechanism and output projection layer to generate attention outputs.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
class ErnieMAttention(nn.Module):

    """
    ErnieMAttention is a class that represents an attention mechanism used in the ERNIE-M model.
    It contains methods for initializing the attention mechanism, pruning attention heads, and forwarding attention outputs.
    This class inherits from nn.Module and utilizes an ErnieMSelfAttention module for self-attention calculations.
    The attention mechanism includes projection layers for query, key, and value, as well as an output projection layer.
    The `prune_heads` method allows for pruning specific attention heads based on provided indices.
    The `forward` method processes input hidden states through the self-attention mechanism and output projection
    layer to generate attention outputs.
    """
    def __init__(self, config, position_embedding_type=None):
        """
        Initialize the ErnieMAttention class.

        Args:
            self: The instance of the class.
            config: An object containing configuration parameters.
            position_embedding_type: Type of position embedding to be used, default is None.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.self_attn = ErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
        self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        """
        This method 'prune_heads' belongs to the class 'ErnieMAttention' and is responsible for pruning specific
        attention heads in the model based on the provided list of heads.

        Args:
            self: Instance of the 'ErnieMAttention' class. It is used to access attributes and methods within the class.
            heads: A list containing the indices of the attention heads that need to be pruned. Each element in the list
                should be an integer representing the index of the head to be pruned.

        Returns:
            None: This method does not return any value but modifies the attention heads in the model in-place.

        Raises:
            None:
                However, it is assumed that the functions called within this method, 
                such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer', may raise exceptions related to 
                input validation or processing errors.
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
        )

        # Prune linear layers
        self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
        self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
        self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
        self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

        # Update hyper params and store pruned heads
        self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
        self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        This method forwards the ErnieMAttention module.

        Args:
            self: The instance of the ErnieMAttention class.
            hidden_states (mindspore.Tensor): The input hidden states tensor.
            attention_mask (Optional[mindspore.Tensor]): Optional tensor containing attention mask values.
            head_mask (Optional[mindspore.Tensor]): Optional tensor containing head mask values.
            encoder_hidden_states (Optional[mindspore.Tensor]): Optional tensor containing encoder hidden states.
            encoder_attention_mask (Optional[mindspore.Tensor]): Optional tensor containing encoder attention mask values.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): Optional tuple containing past key and value tensors.
            output_attentions (Optional[bool]): Optional boolean indicating whether to output attentions.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the attention output tensor.

        Raises:
            None
        """
        self_outputs = self.self_attn(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states,
            encoder_attention_mask,
            past_key_value,
            output_attentions,
        )
        attention_output = self.out_proj(self_outputs[0])
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention.__init__(config, position_embedding_type=None)

Initialize the ErnieMAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration parameters.

position_embedding_type

Type of position embedding to be used, default is None.

DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
def __init__(self, config, position_embedding_type=None):
    """
    Initialize the ErnieMAttention class.

    Args:
        self: The instance of the class.
        config: An object containing configuration parameters.
        position_embedding_type: Type of position embedding to be used, default is None.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.self_attn = ErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
    self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
    self.pruned_heads = set()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

This method forwards the ErnieMAttention module.

PARAMETER DESCRIPTION
self

The instance of the ErnieMAttention class.

hidden_states

The input hidden states tensor.

TYPE: Tensor

attention_mask

Optional tensor containing attention mask values.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

Optional tensor containing head mask values.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

Optional tensor containing encoder hidden states.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

Optional tensor containing encoder attention mask values.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

Optional tuple containing past key and value tensors.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Optional boolean indicating whether to output attentions.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the attention output tensor.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    This method forwards the ErnieMAttention module.

    Args:
        self: The instance of the ErnieMAttention class.
        hidden_states (mindspore.Tensor): The input hidden states tensor.
        attention_mask (Optional[mindspore.Tensor]): Optional tensor containing attention mask values.
        head_mask (Optional[mindspore.Tensor]): Optional tensor containing head mask values.
        encoder_hidden_states (Optional[mindspore.Tensor]): Optional tensor containing encoder hidden states.
        encoder_attention_mask (Optional[mindspore.Tensor]): Optional tensor containing encoder attention mask values.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): Optional tuple containing past key and value tensors.
        output_attentions (Optional[bool]): Optional boolean indicating whether to output attentions.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the attention output tensor.

    Raises:
        None
    """
    self_outputs = self.self_attn(
        hidden_states,
        attention_mask,
        head_mask,
        encoder_hidden_states,
        encoder_attention_mask,
        past_key_value,
        output_attentions,
    )
    attention_output = self.out_proj(self_outputs[0])
    outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
    return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention.prune_heads(heads)

This method 'prune_heads' belongs to the class 'ErnieMAttention' and is responsible for pruning specific attention heads in the model based on the provided list of heads.

PARAMETER DESCRIPTION
self

Instance of the 'ErnieMAttention' class. It is used to access attributes and methods within the class.

heads

A list containing the indices of the attention heads that need to be pruned. Each element in the list should be an integer representing the index of the head to be pruned.

RETURNS DESCRIPTION
None

This method does not return any value but modifies the attention heads in the model in-place.

RAISES DESCRIPTION
None

However, it is assumed that the functions called within this method, such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer', may raise exceptions related to input validation or processing errors.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
def prune_heads(self, heads):
    """
    This method 'prune_heads' belongs to the class 'ErnieMAttention' and is responsible for pruning specific
    attention heads in the model based on the provided list of heads.

    Args:
        self: Instance of the 'ErnieMAttention' class. It is used to access attributes and methods within the class.
        heads: A list containing the indices of the attention heads that need to be pruned. Each element in the list
            should be an integer representing the index of the head to be pruned.

    Returns:
        None: This method does not return any value but modifies the attention heads in the model in-place.

    Raises:
        None:
            However, it is assumed that the functions called within this method, 
            such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer', may raise exceptions related to 
            input validation or processing errors.
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
    )

    # Prune linear layers
    self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
    self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
    self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
    self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

    # Update hyper params and store pruned heads
    self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
    self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEmbeddings

Bases: Module

Construct the embeddings from word and position embeddings.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
class ErnieMEmbeddings(nn.Module):
    """Construct the embeddings from word and position embeddings."""
    def __init__(self, config):
        """
        Args:
            self (object): The instance of the ErnieMEmbeddings class.
            config (object): An object containing configuration parameters for the ErnieMEmbeddings instance,
                including the hidden size, vocabulary size, maximum position embeddings, padding token ID, layer
                normalization epsilon, and hidden dropout probability.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of the expected type.
            ValueError: If the config parameter does not contain required attributes or if the padding token ID is not valid.
        """
        super().__init__()
        self.hidden_size = config.hidden_size
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(
            config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
        )
        self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
        self.padding_idx = config.pad_token_id

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values_length: int = 0,
    ) -> mindspore.Tensor:
        """
        This method 'forward' in the class 'ErnieMEmbeddings' forwards the embeddings for the input tokens.

        Args:
            self: The instance of the class.
            input_ids (Optional[mindspore.Tensor]):
                The input token IDs. Default is None. If None, 'inputs_embeds' is used to generate the embeddings.
            position_ids (Optional[mindspore.Tensor]): The position IDs for the input tokens.
                Default is None. If None, position IDs are calculated based on the input shape.
            inputs_embeds (Optional[mindspore.Tensor]): The input embeddings.
                Default is None. If None, input embeddings are generated using 'word_embeddings' based on 'input_ids'.
            past_key_values_length (int): The length of past key values.
                Default is 0. It is used to adjust the 'position_ids' if past key values are present.

        Returns:
            mindspore.Tensor: The forwarded embeddings for the input tokens.

        Raises:
            ValueError: If the input shape is invalid or if 'position_ids' cannot be calculated.
            TypeError: If the input types are not as expected.
        """
        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        if position_ids is None:
            input_shape = inputs_embeds.shape[:-1]
            ones = ops.ones(input_shape, dtype=mindspore.int64)
            seq_length = ops.cumsum(ones, dim=1)
            position_ids = seq_length - ones

            if past_key_values_length > 0:
                position_ids = position_ids + past_key_values_length
        # to mimic paddlenlp implementation
        position_ids += 2
        position_embeddings = self.position_embeddings(position_ids)
        embeddings = inputs_embeds + position_embeddings
        embeddings = self.layer_norm(embeddings)
        embeddings = self.dropout(embeddings)

        return embeddings

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEmbeddings.__init__(config)

PARAMETER DESCRIPTION
self

The instance of the ErnieMEmbeddings class.

TYPE: object

config

An object containing configuration parameters for the ErnieMEmbeddings instance, including the hidden size, vocabulary size, maximum position embeddings, padding token ID, layer normalization epsilon, and hidden dropout probability.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of the expected type.

ValueError

If the config parameter does not contain required attributes or if the padding token ID is not valid.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def __init__(self, config):
    """
    Args:
        self (object): The instance of the ErnieMEmbeddings class.
        config (object): An object containing configuration parameters for the ErnieMEmbeddings instance,
            including the hidden size, vocabulary size, maximum position embeddings, padding token ID, layer
            normalization epsilon, and hidden dropout probability.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of the expected type.
        ValueError: If the config parameter does not contain required attributes or if the padding token ID is not valid.
    """
    super().__init__()
    self.hidden_size = config.hidden_size
    self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
    self.position_embeddings = nn.Embedding(
        config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
    )
    self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
    self.padding_idx = config.pad_token_id

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEmbeddings.forward(input_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0)

This method 'forward' in the class 'ErnieMEmbeddings' forwards the embeddings for the input tokens.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input token IDs. Default is None. If None, 'inputs_embeds' is used to generate the embeddings.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position IDs for the input tokens. Default is None. If None, position IDs are calculated based on the input shape.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings. Default is None. If None, input embeddings are generated using 'word_embeddings' based on 'input_ids'.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values_length

The length of past key values. Default is 0. It is used to adjust the 'position_ids' if past key values are present.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The forwarded embeddings for the input tokens.

RAISES DESCRIPTION
ValueError

If the input shape is invalid or if 'position_ids' cannot be calculated.

TypeError

If the input types are not as expected.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values_length: int = 0,
) -> mindspore.Tensor:
    """
    This method 'forward' in the class 'ErnieMEmbeddings' forwards the embeddings for the input tokens.

    Args:
        self: The instance of the class.
        input_ids (Optional[mindspore.Tensor]):
            The input token IDs. Default is None. If None, 'inputs_embeds' is used to generate the embeddings.
        position_ids (Optional[mindspore.Tensor]): The position IDs for the input tokens.
            Default is None. If None, position IDs are calculated based on the input shape.
        inputs_embeds (Optional[mindspore.Tensor]): The input embeddings.
            Default is None. If None, input embeddings are generated using 'word_embeddings' based on 'input_ids'.
        past_key_values_length (int): The length of past key values.
            Default is 0. It is used to adjust the 'position_ids' if past key values are present.

    Returns:
        mindspore.Tensor: The forwarded embeddings for the input tokens.

    Raises:
        ValueError: If the input shape is invalid or if 'position_ids' cannot be calculated.
        TypeError: If the input types are not as expected.
    """
    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    if position_ids is None:
        input_shape = inputs_embeds.shape[:-1]
        ones = ops.ones(input_shape, dtype=mindspore.int64)
        seq_length = ops.cumsum(ones, dim=1)
        position_ids = seq_length - ones

        if past_key_values_length > 0:
            position_ids = position_ids + past_key_values_length
    # to mimic paddlenlp implementation
    position_ids += 2
    position_embeddings = self.position_embeddings(position_ids)
    embeddings = inputs_embeds + position_embeddings
    embeddings = self.layer_norm(embeddings)
    embeddings = self.dropout(embeddings)

    return embeddings

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoder

Bases: Module

ErnieMEncoder represents a multi-layer Transformer-based encoder model for processing sequences of input data.

The ErnieMEncoder class inherits from nn.Module and implements a multi-layer Transformer-based encoder, with the ability to return hidden states and attention weights if specified. The class provides methods for initializing the model and processing input data through its layers.

ATTRIBUTE DESCRIPTION
config

A configuration object containing the model's hyperparameters.

layers

A list of ErnieMEncoderLayer instances representing the individual layers of the encoder model.

METHOD DESCRIPTION
forward

Processes input embeddings through the encoder layers, optionally returning hidden states and

Please note that the actual code implementation is not included in this docstring.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
class ErnieMEncoder(nn.Module):

    """
    ErnieMEncoder represents a multi-layer Transformer-based encoder model for processing sequences of input data.

    The ErnieMEncoder class inherits from nn.Module and implements a multi-layer Transformer-based encoder,
    with the ability to return hidden states and attention weights if specified.
    The class provides methods for initializing the model and processing input data through its layers.

    Attributes:
        config: A configuration object containing the model's hyperparameters.
        layers: A list of ErnieMEncoderLayer instances representing the individual layers of the encoder model.

    Methods:
        forward: Processes input embeddings through the encoder layers, optionally returning hidden states and
        attention weights based on the specified parameters.

    Please note that the actual code implementation is not included in this docstring.
    """
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMEncoder class.

        Args:
            self (ErnieMEncoder): The instance of the ErnieMEncoder class.
            config (object): The configuration object containing settings for the ErnieMEncoder.
                This parameter is required for configuring the ErnieMEncoder instance.
                It should be an object that provides necessary configuration details.
                It is expected to have attributes such as num_hidden_layers to specify the number of hidden layers.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.config = config
        self.layers = nn.ModuleList([ErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

    def forward(
        self,
        input_embeds: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
        output_hidden_states: Optional[bool] = False,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
        """
        Constructs the ErnieMEncoder.

        Args:
            self: The instance of the class.
            input_embeds (mindspore.Tensor): The input embeddings. Shape (batch_size, sequence_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor]): The attention mask. Shape (batch_size, sequence_length).
            head_mask (Optional[mindspore.Tensor]): The head mask. Shape (num_layers, num_heads).
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key values.
                Shape (num_layers, 2, batch_size, num_heads, sequence_length // num_heads, hidden_size // num_heads).
            output_attentions (Optional[bool]): Whether to output attention weights. Default is False.
            output_hidden_states (Optional[bool]): Whether to output hidden states. Default is False.
            return_dict (Optional[bool]): Whether to return a BaseModelOutputWithPastAndCrossAttentions. Default is True.

        Returns:
            Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
                The encoded last hidden state, optional hidden states, and optional attention weights.

        Raises:
            None.
        """
        hidden_states = () if output_hidden_states else None
        attentions = () if output_attentions else None

        output = input_embeds
        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        for i, layer in enumerate(self.layers):
            layer_head_mask = head_mask[i] if head_mask is not None else None
            past_key_value = past_key_values[i] if past_key_values is not None else None

            output, opt_attn_weights = layer(
                hidden_states=output,
                attention_mask=attention_mask,
                head_mask=layer_head_mask,
                past_key_value=past_key_value,
            )

            if output_hidden_states:
                hidden_states = hidden_states + (output,)
            if output_attentions:
                attentions = attentions + (opt_attn_weights,)

        last_hidden_state = output
        if not return_dict:
            return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

        return BaseModelOutputWithPastAndCrossAttentions(
            last_hidden_state=last_hidden_state, hidden_states=hidden_states, attentions=attentions
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoder.__init__(config)

Initializes an instance of the ErnieMEncoder class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMEncoder class.

TYPE: ErnieMEncoder

config

The configuration object containing settings for the ErnieMEncoder. This parameter is required for configuring the ErnieMEncoder instance. It should be an object that provides necessary configuration details. It is expected to have attributes such as num_hidden_layers to specify the number of hidden layers.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
def __init__(self, config):
    """
    Initializes an instance of the ErnieMEncoder class.

    Args:
        self (ErnieMEncoder): The instance of the ErnieMEncoder class.
        config (object): The configuration object containing settings for the ErnieMEncoder.
            This parameter is required for configuring the ErnieMEncoder instance.
            It should be an object that provides necessary configuration details.
            It is expected to have attributes such as num_hidden_layers to specify the number of hidden layers.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.config = config
    self.layers = nn.ModuleList([ErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoder.forward(input_embeds, attention_mask=None, head_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=True)

Constructs the ErnieMEncoder.

PARAMETER DESCRIPTION
self

The instance of the class.

input_embeds

The input embeddings. Shape (batch_size, sequence_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask. Shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask. Shape (num_layers, num_heads).

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The past key values. Shape (num_layers, 2, batch_size, num_heads, sequence_length // num_heads, hidden_size // num_heads).

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to output attention weights. Default is False.

TYPE: Optional[bool] DEFAULT: False

output_hidden_states

Whether to output hidden states. Default is False.

TYPE: Optional[bool] DEFAULT: False

return_dict

Whether to return a BaseModelOutputWithPastAndCrossAttentions. Default is True.

TYPE: Optional[bool] DEFAULT: True

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutputWithPastAndCrossAttentions]

Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]: The encoded last hidden state, optional hidden states, and optional attention weights.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
def forward(
    self,
    input_embeds: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
    output_hidden_states: Optional[bool] = False,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
    """
    Constructs the ErnieMEncoder.

    Args:
        self: The instance of the class.
        input_embeds (mindspore.Tensor): The input embeddings. Shape (batch_size, sequence_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor]): The attention mask. Shape (batch_size, sequence_length).
        head_mask (Optional[mindspore.Tensor]): The head mask. Shape (num_layers, num_heads).
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key values.
            Shape (num_layers, 2, batch_size, num_heads, sequence_length // num_heads, hidden_size // num_heads).
        output_attentions (Optional[bool]): Whether to output attention weights. Default is False.
        output_hidden_states (Optional[bool]): Whether to output hidden states. Default is False.
        return_dict (Optional[bool]): Whether to return a BaseModelOutputWithPastAndCrossAttentions. Default is True.

    Returns:
        Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
            The encoded last hidden state, optional hidden states, and optional attention weights.

    Raises:
        None.
    """
    hidden_states = () if output_hidden_states else None
    attentions = () if output_attentions else None

    output = input_embeds
    if output_hidden_states:
        hidden_states = hidden_states + (output,)
    for i, layer in enumerate(self.layers):
        layer_head_mask = head_mask[i] if head_mask is not None else None
        past_key_value = past_key_values[i] if past_key_values is not None else None

        output, opt_attn_weights = layer(
            hidden_states=output,
            attention_mask=attention_mask,
            head_mask=layer_head_mask,
            past_key_value=past_key_value,
        )

        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        if output_attentions:
            attentions = attentions + (opt_attn_weights,)

    last_hidden_state = output
    if not return_dict:
        return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

    return BaseModelOutputWithPastAndCrossAttentions(
        last_hidden_state=last_hidden_state, hidden_states=hidden_states, attentions=attentions
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoderLayer

Bases: Module

The ErnieMEncoderLayer class represents a single layer of the ErnieM (Enhanced Representation through kNowledge Integration) encoder, which is designed for natural language processing tasks. This class inherits from the nn.Module class and implements the functionality for processing input hidden states using multi-head self-attention mechanism and feedforward neural network layers with layer normalization and dropout.

ATTRIBUTE DESCRIPTION
self_attn

Instance of ErnieMAttention for multi-head self-attention mechanism.

linear1

Instance of nn.Linear for the first feedforward neural network layer.

dropout

Instance of nn.Dropout for applying dropout within the feedforward network.

linear2

Instance of nn.Linear for the second feedforward neural network layer.

norm1

Instance of nn.LayerNorm for the first layer normalization.

norm2

Instance of nn.LayerNorm for the second layer normalization.

dropout1

Instance of nn.Dropout for applying dropout after the first feedforward network layer.

dropout2

Instance of nn.Dropout for applying dropout after the second feedforward network layer.

activation

Activation function for the feedforward network.

METHOD DESCRIPTION
forward

Applies the multi-head self-attention mechanism and feedforward network layers to the input hidden states, optionally producing attention weights.

Args:

  • hidden_states (mindspore.Tensor): The input hidden states.
  • attention_mask (Optional[mindspore.Tensor]): Optional tensor for masking the attention scores.
  • head_mask (Optional[mindspore.Tensor]): Optional tensor for masking specific attention heads.
  • past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): Optional tuple containing past key and value tensors for fast decoding.
  • output_attentions (Optional[bool]): Optional boolean indicating whether to return attention weights.

Returns:

  • mindspore.Tensor or Tuple[mindspore.Tensor]: The processed hidden states and optionally the attention weights.
Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
class ErnieMEncoderLayer(nn.Module):

    """
    The ErnieMEncoderLayer class represents a single layer of the ErnieM (Enhanced Representation through kNowledge 
    Integration) encoder, which is designed for natural language processing tasks. This class inherits from the nn.Module 
    class and implements the functionality for processing input hidden states using multi-head self-attention mechanism 
    and feedforward neural network layers with layer normalization and dropout.

    Attributes:
        self_attn: Instance of ErnieMAttention for multi-head self-attention mechanism.
        linear1: Instance of nn.Linear for the first feedforward neural network layer.
        dropout: Instance of nn.Dropout for applying dropout within the feedforward network.
        linear2: Instance of nn.Linear for the second feedforward neural network layer.
        norm1: Instance of nn.LayerNorm for the first layer normalization.
        norm2: Instance of nn.LayerNorm for the second layer normalization.
        dropout1: Instance of nn.Dropout for applying dropout after the first feedforward network layer.
        dropout2: Instance of nn.Dropout for applying dropout after the second feedforward network layer.
        activation: Activation function for the feedforward network.

    Methods:
        forward(self, hidden_states, attention_mask=None, head_mask=None, past_key_value=None, output_attentions=True):
            Applies the multi-head self-attention mechanism and feedforward network layers to the input hidden states, 
            optionally producing attention weights.

            Args:

            - hidden_states (mindspore.Tensor): The input hidden states.
            - attention_mask (Optional[mindspore.Tensor]): Optional tensor for masking the attention scores.
            - head_mask (Optional[mindspore.Tensor]): Optional tensor for masking specific attention heads.
            - past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]):
            Optional tuple containing past key and value tensors for fast decoding.
            - output_attentions (Optional[bool]): Optional boolean indicating whether to return attention weights.

            Returns:

            - mindspore.Tensor or Tuple[mindspore.Tensor]: The processed hidden states and optionally the attention weights.
    """
    def __init__(self, config):
        """
        Initialize an instance of the ErnieMEncoderLayer class.

        Args:
            self (ErnieMEncoderLayer): The instance of the ErnieMEncoderLayer class.
            config (object): 
                An object containing configuration parameters for the encoder layer.

                - hidden_dropout_prob (float): The probability of dropout for hidden layers. Default is 0.1.
                - act_dropout (float): The probability of dropout for activation functions. 
                Default is the value of hidden_dropout_prob.
                - hidden_size (int): The size of the hidden layers.
                - intermediate_size (int): The size of the intermediate layers.
                - layer_norm_eps (float): The epsilon value for layer normalization.
                - hidden_act (str or function): The activation function to be used. 
                If a string, it will be converted to a function using ACT2FN dictionary.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        # to mimic paddlenlp implementation
        dropout = 0.1 if config.hidden_dropout_prob is None else config.hidden_dropout_prob
        act_dropout = config.hidden_dropout_prob if config.act_dropout is None else config.act_dropout

        self.self_attn = ErnieMAttention(config)
        self.linear1 = nn.Linear(config.hidden_size, config.intermediate_size)
        self.dropout = nn.Dropout(p=act_dropout)
        self.linear2 = nn.Linear(config.intermediate_size, config.hidden_size)
        self.norm1 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.norm2 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout1 = nn.Dropout(p=dropout)
        self.dropout2 = nn.Dropout(p=dropout)
        if isinstance(config.hidden_act, str):
            self.activation = ACT2FN[config.hidden_act]
        else:
            self.activation = config.hidden_act

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = True,
    ):
        """
        Constructs an ErnieMEncoderLayer.

        This method applies the ErnieMEncoderLayer transformation to the input hidden states.

        Args:
            self: An instance of the ErnieMEncoderLayer class.
            hidden_states (mindspore.Tensor): The input hidden states. This should be a tensor.
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor. Defaults to None.
            head_mask (Optional[mindspore.Tensor]): The head mask tensor. Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key value tensor. Defaults to None.
            output_attentions (Optional[bool]): Whether to output attention weights. Defaults to True.

        Returns:
            None.

        Raises:
            None.
        """
        residual = hidden_states
        if output_attentions:
            hidden_states, attention_opt_weights = self.self_attn(
                hidden_states=hidden_states,
                attention_mask=attention_mask,
                head_mask=head_mask,
                past_key_value=past_key_value,
                output_attentions=output_attentions,
            )

        else:
            hidden_states = self.self_attn(
                hidden_states=hidden_states,
                attention_mask=attention_mask,
                head_mask=head_mask,
                past_key_value=past_key_value,
                output_attentions=output_attentions,
            )
        hidden_states = residual + self.dropout1(hidden_states)
        hidden_states = self.norm1(hidden_states)
        residual = hidden_states

        hidden_states = self.linear1(hidden_states)
        hidden_states = self.activation(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.linear2(hidden_states)
        hidden_states = residual + self.dropout2(hidden_states)
        hidden_states = self.norm2(hidden_states)

        if output_attentions:
            return hidden_states, attention_opt_weights
        return hidden_states

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoderLayer.__init__(config)

Initialize an instance of the ErnieMEncoderLayer class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMEncoderLayer class.

TYPE: ErnieMEncoderLayer

config

An object containing configuration parameters for the encoder layer.

  • hidden_dropout_prob (float): The probability of dropout for hidden layers. Default is 0.1.
  • act_dropout (float): The probability of dropout for activation functions. Default is the value of hidden_dropout_prob.
  • hidden_size (int): The size of the hidden layers.
  • intermediate_size (int): The size of the intermediate layers.
  • layer_norm_eps (float): The epsilon value for layer normalization.
  • hidden_act (str or function): The activation function to be used. If a string, it will be converted to a function using ACT2FN dictionary.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
def __init__(self, config):
    """
    Initialize an instance of the ErnieMEncoderLayer class.

    Args:
        self (ErnieMEncoderLayer): The instance of the ErnieMEncoderLayer class.
        config (object): 
            An object containing configuration parameters for the encoder layer.

            - hidden_dropout_prob (float): The probability of dropout for hidden layers. Default is 0.1.
            - act_dropout (float): The probability of dropout for activation functions. 
            Default is the value of hidden_dropout_prob.
            - hidden_size (int): The size of the hidden layers.
            - intermediate_size (int): The size of the intermediate layers.
            - layer_norm_eps (float): The epsilon value for layer normalization.
            - hidden_act (str or function): The activation function to be used. 
            If a string, it will be converted to a function using ACT2FN dictionary.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    # to mimic paddlenlp implementation
    dropout = 0.1 if config.hidden_dropout_prob is None else config.hidden_dropout_prob
    act_dropout = config.hidden_dropout_prob if config.act_dropout is None else config.act_dropout

    self.self_attn = ErnieMAttention(config)
    self.linear1 = nn.Linear(config.hidden_size, config.intermediate_size)
    self.dropout = nn.Dropout(p=act_dropout)
    self.linear2 = nn.Linear(config.intermediate_size, config.hidden_size)
    self.norm1 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.norm2 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout1 = nn.Dropout(p=dropout)
    self.dropout2 = nn.Dropout(p=dropout)
    if isinstance(config.hidden_act, str):
        self.activation = ACT2FN[config.hidden_act]
    else:
        self.activation = config.hidden_act

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoderLayer.forward(hidden_states, attention_mask=None, head_mask=None, past_key_value=None, output_attentions=True)

Constructs an ErnieMEncoderLayer.

This method applies the ErnieMEncoderLayer transformation to the input hidden states.

PARAMETER DESCRIPTION
self

An instance of the ErnieMEncoderLayer class.

hidden_states

The input hidden states. This should be a tensor.

TYPE: Tensor

attention_mask

The attention mask tensor. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

The past key value tensor. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to output attention weights. Defaults to True.

TYPE: Optional[bool] DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = True,
):
    """
    Constructs an ErnieMEncoderLayer.

    This method applies the ErnieMEncoderLayer transformation to the input hidden states.

    Args:
        self: An instance of the ErnieMEncoderLayer class.
        hidden_states (mindspore.Tensor): The input hidden states. This should be a tensor.
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor. Defaults to None.
        head_mask (Optional[mindspore.Tensor]): The head mask tensor. Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key value tensor. Defaults to None.
        output_attentions (Optional[bool]): Whether to output attention weights. Defaults to True.

    Returns:
        None.

    Raises:
        None.
    """
    residual = hidden_states
    if output_attentions:
        hidden_states, attention_opt_weights = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            head_mask=head_mask,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
        )

    else:
        hidden_states = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            head_mask=head_mask,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
        )
    hidden_states = residual + self.dropout1(hidden_states)
    hidden_states = self.norm1(hidden_states)
    residual = hidden_states

    hidden_states = self.linear1(hidden_states)
    hidden_states = self.activation(hidden_states)
    hidden_states = self.dropout(hidden_states)
    hidden_states = self.linear2(hidden_states)
    hidden_states = residual + self.dropout2(hidden_states)
    hidden_states = self.norm2(hidden_states)

    if output_attentions:
        return hidden_states, attention_opt_weights
    return hidden_states

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForInformationExtraction

Bases: ErnieMPreTrainedModel

ErnieMForInformationExtraction is a class that represents an ErnieM model for information extraction tasks. It inherits from ErnieMPreTrainedModel and includes methods for initializing the model and forwarding the forward pass.

ATTRIBUTE DESCRIPTION
ernie_m

The ErnieM model used for information extraction.

TYPE: ErnieMModel

linear_start

Linear layer for predicting the start position in the input sequence.

TYPE: Linear

linear_end

Linear layer for predicting the end position in the input sequence.

TYPE: Linear

sigmoid

Sigmoid activation function for probability calculation.

TYPE: Sigmoid

METHOD DESCRIPTION
__init__

Initializes the ErnieMForInformationExtraction class with the provided configuration.

forward

Constructs the forward pass of the model for information extraction tasks.

PARAMETER DESCRIPTION
input_ids

Input tensor containing token ids.

TYPE: Tensor

attention_mask

Tensor specifying which tokens should be attended to.

TYPE: Tensor

position_ids

Tensor specifying the position ids of tokens.

TYPE: Tensor

head_mask

Tensor for masking specific heads in the self-attention layers.

TYPE: Tensor

inputs_embeds

Tensor for providing custom embeddings instead of token ids.

TYPE: Tensor

start_positions

Labels for start positions in the input sequence.

TYPE: Tensor

end_positions

Labels for end positions in the input sequence.

TYPE: Tensor

output_attentions

Flag to output attention weights.

TYPE: bool

output_hidden_states

Flag to output hidden states.

TYPE: bool

return_dict

Flag to return outputs as a dictionary.

TYPE: bool

RETURNS DESCRIPTION

Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]: Tuple of output tensors or a QuestionAnsweringModelOutput object.

RAISES DESCRIPTION
ValueError

If start_positions or end_positions are not of the expected shape.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
class ErnieMForInformationExtraction(ErnieMPreTrainedModel):

    """
    ErnieMForInformationExtraction is a class that represents an ErnieM model for information extraction tasks. 
    It inherits from ErnieMPreTrainedModel and includes methods for initializing the model and forwarding the forward pass.

    Attributes:
        ernie_m (ErnieMModel): The ErnieM model used for information extraction.
        linear_start (nn.Linear): Linear layer for predicting the start position in the input sequence.
        linear_end (nn.Linear): Linear layer for predicting the end position in the input sequence.
        sigmoid (nn.Sigmoid): Sigmoid activation function for probability calculation.

    Methods:
        __init__: Initializes the ErnieMForInformationExtraction class with the provided configuration.
        forward: Constructs the forward pass of the model for information extraction tasks.

    Args:
        input_ids (mindspore.Tensor): Input tensor containing token ids.
        attention_mask (mindspore.Tensor): Tensor specifying which tokens should be attended to.
        position_ids (mindspore.Tensor): Tensor specifying the position ids of tokens.
        head_mask (mindspore.Tensor): Tensor for masking specific heads in the self-attention layers.
        inputs_embeds (mindspore.Tensor): Tensor for providing custom embeddings instead of token ids.
        start_positions (mindspore.Tensor): Labels for start positions in the input sequence.
        end_positions (mindspore.Tensor): Labels for end positions in the input sequence.
        output_attentions (bool): Flag to output attention weights.
        output_hidden_states (bool): Flag to output hidden states.
        return_dict (bool): Flag to return outputs as a dictionary.

    Returns:
        Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]: Tuple of output tensors or a QuestionAnsweringModelOutput object.

    Raises:
        ValueError: If start_positions or end_positions are not of the expected shape.

    """
    def __init__(self, config):
        """
        Initializes a new instance of the ErnieMForInformationExtraction class.

        Args:
            self: The instance of the class.
            config: An instance of the ErnieMConfig class containing the configuration parameters for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.ernie_m = ErnieMModel(config)
        self.linear_start = nn.Linear(config.hidden_size, 1)
        self.linear_end = nn.Linear(config.hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
                not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
                taken into account for computing the loss.
        """
        result = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        if return_dict:
            sequence_output = result.last_hidden_state
        elif not return_dict:
            sequence_output = result[0]

        start_logits = self.linear_start(sequence_output)
        start_logits = start_logits.squeeze(-1)
        start_prob = self.sigmoid(start_logits)
        end_logits = self.linear_end(sequence_output)
        end_logits = end_logits.squeeze(-1)
        end_prob = self.sigmoid(end_logits)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = F.binary_cross_entropy_with_logits(start_prob, start_positions)
            end_loss = F.binary_cross_entropy_with_logits(end_prob, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            return tuple(
                i
                for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
                if i is not None
            )

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_prob,
            end_logits=end_prob,
            hidden_states=result.hidden_states,
            attentions=result.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForInformationExtraction.__init__(config)

Initializes a new instance of the ErnieMForInformationExtraction class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of the ErnieMConfig class containing the configuration parameters for the model.

RETURNS DESCRIPTION

None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
def __init__(self, config):
    """
    Initializes a new instance of the ErnieMForInformationExtraction class.

    Args:
        self: The instance of the class.
        config: An instance of the ErnieMConfig class containing the configuration parameters for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.ernie_m = ErnieMModel(config)
    self.linear_start = nn.Linear(config.hidden_size, 1)
    self.linear_end = nn.Linear(config.hidden_size, 1)
    self.sigmoid = nn.Sigmoid()
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForInformationExtraction.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) for computing the start_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

end_positions

Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
            not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
            taken into account for computing the loss.
    """
    result = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    if return_dict:
        sequence_output = result.last_hidden_state
    elif not return_dict:
        sequence_output = result[0]

    start_logits = self.linear_start(sequence_output)
    start_logits = start_logits.squeeze(-1)
    start_prob = self.sigmoid(start_logits)
    end_logits = self.linear_end(sequence_output)
    end_logits = end_logits.squeeze(-1)
    end_prob = self.sigmoid(end_logits)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = F.binary_cross_entropy_with_logits(start_prob, start_positions)
        end_loss = F.binary_cross_entropy_with_logits(end_prob, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        return tuple(
            i
            for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
            if i is not None
        )

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_prob,
        end_logits=end_prob,
        hidden_states=result.hidden_states,
        attentions=result.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForMultipleChoice

Bases: ErnieMPreTrainedModel

ErnieMForMultipleChoice is a class that represents a multiple choice question answering model based on the ERNIE-M architecture. It inherits from ErnieMPreTrainedModel and implements methods for forwarding the model and computing the multiple choice classification loss.

ATTRIBUTE DESCRIPTION
ernie_m

The ERNIE-M model used for processing inputs.

TYPE: ErnieMModel

dropout

Dropout layer used in the classifier.

TYPE: Dropout

classifier

Dense layer for classification.

TYPE: Linear

METHOD DESCRIPTION
__init__

Initializes the ErnieMForMultipleChoice model with the given configuration.

forward

Constructs the model for multiple choice question answering and computes the classification loss.

The forward method takes various input tensors and parameters, processes them through the ERNIE-M model, applies dropout, and computes the classification logits. If labels are provided, it calculates the cross-entropy loss. The method returns the loss and model outputs based on the return_dict parameter.

This class is designed to be used for multiple choice question answering tasks with ERNIE-M models.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
class ErnieMForMultipleChoice(ErnieMPreTrainedModel):

    """
    ErnieMForMultipleChoice is a class that represents a multiple choice question answering model based on the
    ERNIE-M architecture.
    It inherits from ErnieMPreTrainedModel and implements methods for forwarding the model and computing the multiple
    choice classification loss.

    Attributes:
        ernie_m (ErnieMModel): The ERNIE-M model used for processing inputs.
        dropout (nn.Dropout): Dropout layer used in the classifier.
        classifier (nn.Linear): Dense layer for classification.

    Methods:
        __init__: Initializes the ErnieMForMultipleChoice model with the given configuration.
        forward: Constructs the model for multiple choice question answering and computes the classification loss.

    The forward method takes various input tensors and parameters, processes them through the ERNIE-M model,
    applies dropout, and computes the classification logits.
    If labels are provided, it calculates the cross-entropy loss. The method returns the loss and model outputs based on
    the return_dict parameter.

    This class is designed to be used for multiple choice question answering tasks with ERNIE-M models.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMForMultipleChoice class.

        Args:
            self: The object instance.
            config: An instance of the ErnieMConfig class containing the model configuration.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)

        self.ernie_m = ErnieMModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, 1)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], MultipleChoiceModelOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
                num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
                `input_ids` above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForMultipleChoice.__init__(config)

Initializes an instance of the ErnieMForMultipleChoice class.

PARAMETER DESCRIPTION
self

The object instance.

config

An instance of the ErnieMConfig class containing the model configuration.

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
def __init__(self, config):
    """
    Initializes an instance of the ErnieMForMultipleChoice class.

    Args:
        self: The object instance.
        config: An instance of the ErnieMConfig class containing the model configuration.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)

    self.ernie_m = ErnieMModel(config)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, 1)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForMultipleChoice.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], MultipleChoiceModelOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
            `input_ids` above)
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(reshaped_logits, labels)

    if not return_dict:
        output = (reshaped_logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return MultipleChoiceModelOutput(
        loss=loss,
        logits=reshaped_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForQuestionAnswering

Bases: ErnieMPreTrainedModel

ErnieMForQuestionAnswering is a class that represents a fine-tuned ErnieM model for question answering tasks. It is a subclass of ErnieMPreTrainedModel.

This class extends the functionality of the base ErnieM model by adding a question answering head on top of it. It takes as input the configuration of the model and initializes the necessary components. The class provides a method called 'forward' which performs the forward pass of the model for question answering.

The 'forward' method takes several input tensors such as 'input_ids', 'attention_mask', 'position_ids', 'head_mask', and 'inputs_embeds'. It also supports optional inputs like 'start_positions', 'end_positions', 'output_attentions', 'output_hidden_states', and 'return_dict'. The method returns the question answering model output, which includes the start and end logits, hidden states, attentions, and an optional total loss.

The 'forward' method internally calls the 'ernie_m' method of the base ErnieM model to obtain the sequence output. It then passes the sequence output through a dense layer 'qa_outputs' to get the logits for the start and end positions. The logits are then processed to obtain the final start and end logits. If 'start_positions' and 'end_positions' are provided, the method calculates the cross-entropy loss for the predicted logits and the provided positions. The total loss is computed as the average of the start and end losses.

The 'forward' method returns the model output in a structured manner based on the 'return_dict' parameter.

  • If 'return_dict' is False, the method returns a tuple containing the total loss, start logits, end logits, and any additional hidden states or attentions.
  • If 'return_dict' is True, the method returns an instance of the 'QuestionAnsweringModelOutput' class, which encapsulates the output elements as attributes.
Note
  • If 'start_positions' and 'end_positions' are not provided, the total loss will be None.
  • The start and end positions are clamped to the length of the sequence and positions outside the sequence are ignored for computing the loss.
Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
class ErnieMForQuestionAnswering(ErnieMPreTrainedModel):

    """
    ErnieMForQuestionAnswering is a class that represents a fine-tuned ErnieM model for question answering tasks.
    It is a subclass of ErnieMPreTrainedModel.

    This class extends the functionality of the base ErnieM model by adding a question answering head on top of it.
    It takes as input the configuration of the model and initializes the necessary components.
    The class provides a method called 'forward' which performs the forward pass of the model for question answering.

    The 'forward' method takes several input tensors such as 'input_ids', 'attention_mask', 'position_ids',
    'head_mask', and 'inputs_embeds'. It also supports optional inputs like 'start_positions', 'end_positions',
    'output_attentions', 'output_hidden_states', and 'return_dict'.
    The method returns the question answering model output, which includes the start and end logits, hidden states,
    attentions, and an optional total loss.

    The 'forward' method internally calls the 'ernie_m' method of the base ErnieM model to obtain the sequence output.
    It then passes the sequence output through a dense layer 'qa_outputs' to get the logits for the start and end
    positions. The logits are then processed to obtain the final start and end logits. If 'start_positions' and
    'end_positions' are provided, the method calculates the cross-entropy loss for the predicted logits and the provided
    positions. The total loss is computed as the average of the start and end losses.

    The 'forward' method returns the model output in a structured manner based on the 'return_dict' parameter.

    - If 'return_dict' is False, the method returns a tuple containing the total loss, start logits, end logits, and any
    additional hidden states or attentions.
    - If 'return_dict' is True, the method returns an instance of the 'QuestionAnsweringModelOutput' class, which
    encapsulates the output elements as attributes.

    Note:
        - If 'start_positions' and 'end_positions' are not provided, the total loss will be None.
        - The start and end positions are clamped to the length of the sequence and positions outside the sequence are
        ignored for computing the loss.

    """
    # Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """Initializes a new instance of the ErnieMForQuestionAnswering class.

        Args:
            self: The object itself.
            config: An instance of the ErnieMConfig class containing the model configuration.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        logits = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
            end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForQuestionAnswering.__init__(config)

Initializes a new instance of the ErnieMForQuestionAnswering class.

PARAMETER DESCRIPTION
self

The object itself.

config

An instance of the ErnieMConfig class containing the model configuration.

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
def __init__(self, config):
    """Initializes a new instance of the ErnieMForQuestionAnswering class.

    Args:
        self: The object itself.
        config: An instance of the ErnieMConfig class containing the model configuration.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
    self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForQuestionAnswering.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    logits = self.qa_outputs(sequence_output)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
        end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (start_logits, end_logits) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForSequenceClassification

Bases: ErnieMPreTrainedModel

ErnieMForSequenceClassification is a class that represents a fine-tuned ErnieM model for sequence classification tasks. It inherits from ErnieMPreTrainedModel and implements methods for initializing the model and forwarding predictions.

ATTRIBUTE DESCRIPTION
num_labels

Number of labels for sequence classification.

config

Configuration object for the model.

ernie_m

ErnieMModel instance for processing input sequences.

dropout

Dropout layer for regularization.

classifier

Dense layer for classification predictions.

METHOD DESCRIPTION
__init__

Initializes the ErnieMForSequenceClassification instance with the provided configuration.

forward

Constructs the model for making predictions on input sequences and computes the loss based on predicted labels.

Args:

  • input_ids (Optional[mindspore.Tensor]): Tensor of input token IDs.
  • attention_mask (Optional[mindspore.Tensor]): Tensor of attention masks.
  • position_ids (Optional[mindspore.Tensor]): Tensor of position IDs.
  • head_mask (Optional[mindspore.Tensor]): Tensor of head masks.
  • inputs_embeds (Optional[mindspore.Tensor]): Tensor of input embeddings.
  • past_key_values (Optional[List[mindspore.Tensor]]): List of past key values for caching.
  • use_cache (Optional[bool]): Flag for using caching.
  • output_hidden_states (Optional[bool]): Flag for outputting hidden states.
  • output_attentions (Optional[bool]): Flag for outputting attentions.
  • return_dict (Optional[bool]): Flag for returning output in a dictionary format.
  • labels (Optional[mindspore.Tensor]): Tensor of target labels for computing loss.

Returns:

  • Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]: Tuple of model outputs and loss.

Raises:

  • ValueError: If the provided labels are not in the expected format or number.
Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
class ErnieMForSequenceClassification(ErnieMPreTrainedModel):

    """
    ErnieMForSequenceClassification is a class that represents a fine-tuned ErnieM model for sequence classification tasks.
    It inherits from ErnieMPreTrainedModel and implements methods for initializing the model and forwarding predictions.

    Attributes:
        num_labels: Number of labels for sequence classification.
        config: Configuration object for the model.
        ernie_m: ErnieMModel instance for processing input sequences.
        dropout: Dropout layer for regularization.
        classifier: Dense layer for classification predictions.

    Methods:
        __init__: Initializes the ErnieMForSequenceClassification instance with the provided configuration.
        forward:
            Constructs the model for making predictions on input sequences and computes the loss based on predicted labels.

            Args:

            - input_ids (Optional[mindspore.Tensor]): Tensor of input token IDs.
            - attention_mask (Optional[mindspore.Tensor]): Tensor of attention masks.
            - position_ids (Optional[mindspore.Tensor]): Tensor of position IDs.
            - head_mask (Optional[mindspore.Tensor]): Tensor of head masks.
            - inputs_embeds (Optional[mindspore.Tensor]): Tensor of input embeddings.
            - past_key_values (Optional[List[mindspore.Tensor]]): List of past key values for caching.
            - use_cache (Optional[bool]): Flag for using caching.
            - output_hidden_states (Optional[bool]): Flag for outputting hidden states.
            - output_attentions (Optional[bool]): Flag for outputting attentions.
            - return_dict (Optional[bool]): Flag for returning output in a dictionary format.
            - labels (Optional[mindspore.Tensor]): Tensor of target labels for computing loss.

            Returns:

            - Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]: Tuple of model outputs and loss.

            Raises:

            - ValueError: If the provided labels are not in the expected format or number.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForSequenceClassification.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMForSequenceClassification class.

        Args:
            self: The instance of the class.
            config (object): The configuration object containing settings for the model initialization.
                It must have the following attributes:

                - num_labels (int): The number of labels for classification.
                - classifier_dropout (float, optional): The dropout probability for the classifier layer.
                If not provided, it defaults to the hidden dropout probability.
                - hidden_dropout_prob (float): The default hidden dropout probability.

        Returns:
            None.

        Raises:
            ValueError: If the config object is missing the num_labels attribute.
            TypeError: If the config object does not have the expected attributes or if their types are incorrect.
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.ernie_m = ErnieMModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        return_dict: Optional[bool] = True,
        labels: Optional[mindspore.Tensor] = None,
    ) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            past_key_values=past_key_values,
            output_hidden_states=output_hidden_states,
            output_attentions=output_attentions,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = F.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = F.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = F.binary_cross_entropy_with_logits(logits, labels)
        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForSequenceClassification.__init__(config)

Initializes an instance of the ErnieMForSequenceClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object containing settings for the model initialization. It must have the following attributes:

  • num_labels (int): The number of labels for classification.
  • classifier_dropout (float, optional): The dropout probability for the classifier layer. If not provided, it defaults to the hidden dropout probability.
  • hidden_dropout_prob (float): The default hidden dropout probability.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the config object is missing the num_labels attribute.

TypeError

If the config object does not have the expected attributes or if their types are incorrect.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
def __init__(self, config):
    """
    Initializes an instance of the ErnieMForSequenceClassification class.

    Args:
        self: The instance of the class.
        config (object): The configuration object containing settings for the model initialization.
            It must have the following attributes:

            - num_labels (int): The number of labels for classification.
            - classifier_dropout (float, optional): The dropout probability for the classifier layer.
            If not provided, it defaults to the hidden dropout probability.
            - hidden_dropout_prob (float): The default hidden dropout probability.

    Returns:
        None.

    Raises:
        ValueError: If the config object is missing the num_labels attribute.
        TypeError: If the config object does not have the expected attributes or if their types are incorrect.
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.config = config

    self.ernie_m = ErnieMModel(config)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForSequenceClassification.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, past_key_values=None, use_cache=None, output_hidden_states=None, output_attentions=None, return_dict=True, labels=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    return_dict: Optional[bool] = True,
    labels: Optional[mindspore.Tensor] = None,
) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        past_key_values=past_key_values,
        output_hidden_states=output_hidden_states,
        output_attentions=output_attentions,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = F.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = F.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = F.binary_cross_entropy_with_logits(logits, labels)
    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForTokenClassification

Bases: ErnieMPreTrainedModel

This class represents a fine-tuned ErnieM model for token classification tasks. It inherits from the ErnieMPreTrainedModel class.

The ErnieMForTokenClassification class implements the necessary methods and attributes for token classification tasks. It takes a configuration object as input during initialization and sets up the model architecture accordingly. The model consists of an ErnieMModel instance, a dropout layer, and a classifier layer.

METHOD DESCRIPTION
__init__

Initializes the ErnieMForTokenClassification instance with the given configuration. It sets the number of labels, creates an ErnieMModel object, initializes the dropout layer, and creates the classifier layer.

forward

Constructs the forward pass of the model. It takes various input tensors and returns the token classification output. Optionally, it can also compute the token classification loss if labels are provided.

ATTRIBUTE DESCRIPTION
num_labels

The number of possible labels for the token classification task.

Example
>>> config = ErnieMConfig()
>>> model = ErnieMForTokenClassification(config)
>>> input_ids = ...
>>> attention_mask = ...
>>> output = model.forward(input_ids=input_ids, attention_mask=attention_mask)
Note

It is important to provide the input tensors in the correct shape and format to ensure proper model functioning.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
class ErnieMForTokenClassification(ErnieMPreTrainedModel):

    """
    This class represents a fine-tuned ErnieM model for token classification tasks. It inherits from the ErnieMPreTrainedModel class.

    The ErnieMForTokenClassification class implements the necessary methods and attributes for token classification tasks.
    It takes a configuration object as input during initialization and sets up the model architecture accordingly.
    The model consists of an ErnieMModel instance, a dropout layer, and a classifier layer.

    Methods:
        __init__: Initializes the ErnieMForTokenClassification instance with the given configuration.
            It sets the number of labels, creates an ErnieMModel object, initializes the dropout layer, and
            creates the classifier layer.

        forward: Constructs the forward pass of the model. It takes various input tensors and returns the token
            classification output. Optionally, it can also compute the token classification loss if labels are provided.

    Attributes:
        num_labels: The number of possible labels for the token classification task.

    Example:
        ```python
        >>> config = ErnieMConfig()
        >>> model = ErnieMForTokenClassification(config)
        >>> input_ids = ...
        >>> attention_mask = ...
        >>> output = model.forward(input_ids=input_ids, attention_mask=attention_mask)
        ```

    Note:
        It is important to provide the input tensors in the correct shape and format to ensure proper model functioning.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForTokenClassification.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMForTokenClassification class.

        Args:
            self: The instance of the ErnieMForTokenClassification class.
            config: An instance of the configuration class containing the model configuration settings.

        Returns:
            None

        Raises:
            None.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        return_dict: Optional[bool] = True,
        labels: Optional[mindspore.Tensor] = None,
    ) -> Union[Tuple[mindspore.Tensor], TokenClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            past_key_values=past_key_values,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForTokenClassification.__init__(config)

Initializes an instance of the ErnieMForTokenClassification class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMForTokenClassification class.

config

An instance of the configuration class containing the model configuration settings.

RETURNS DESCRIPTION

None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
def __init__(self, config):
    """
    Initializes an instance of the ErnieMForTokenClassification class.

    Args:
        self: The instance of the ErnieMForTokenClassification class.
        config: An instance of the configuration class containing the model configuration settings.

    Returns:
        None

    Raises:
        None.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForTokenClassification.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, past_key_values=None, output_hidden_states=None, output_attentions=None, return_dict=True, labels=None)

PARAMETER DESCRIPTION
labels

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    return_dict: Optional[bool] = True,
    labels: Optional[mindspore.Tensor] = None,
) -> Union[Tuple[mindspore.Tensor], TokenClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        past_key_values=past_key_values,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel

Bases: ErnieMPreTrainedModel

This class represents an ERNIE-M (Enhanced Representation through kNowledge Integration) model for multi-purpose pre-training and fine-tuning on downstream tasks. It incorporates ERNIE-M embeddings, encoder, and optional pooling layer. The class provides methods for initializing, getting and setting input embeddings, pruning model heads, and forwarding the model with various input and output options. The class inherits from ErnieMPreTrainedModel and extends its functionality to support specific ERNIE-M model architecture and operations.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
class ErnieMModel(ErnieMPreTrainedModel):

    """
    This class represents an ERNIE-M (Enhanced Representation through kNowledge Integration) model for multi-purpose
    pre-training and fine-tuning on downstream tasks. It incorporates ERNIE-M embeddings, encoder, and optional pooling
    layer. The class provides methods for initializing, getting and setting input embeddings, pruning model heads,
    and forwarding the model with various input and output options.
    The class inherits from ErnieMPreTrainedModel and extends its functionality to support specific ERNIE-M model
    architecture and operations.
    """
    def __init__(self, config, add_pooling_layer=True):
        """
        Initializes the ErnieMModel.

        Args:
            self: The instance of the class.
            config (object): The configuration object containing model settings.
            add_pooling_layer (bool): A flag indicating whether to add a pooling layer to the model.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.initializer_range = config.initializer_range
        self.embeddings = ErnieMEmbeddings(config)
        self.encoder = ErnieMEncoder(config)
        self.pooler = ErnieMPooler(config) if add_pooling_layer else None
        self.post_init()

    def get_input_embeddings(self):
        """
        This method returns the input embeddings from the ErnieMModel.

        Args:
            self: ErnieMModel object. The instance of the ErnieMModel class.

        Returns:
            word_embeddings: The method returns the input embeddings from the ErnieMModel.

        Raises:
            None.
        """
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        """
        Set the input embeddings for the ErnieMModel.

        Args:
            self (ErnieMModel): The instance of the ErnieMModel class.
            value: The input embeddings value to be set. It should be a tensor representing the input embeddings.

        Returns:
            None.

        Raises:
            None.
        """
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
        class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.encoder.layers[layer].self_attn.prune_heads(heads)

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
        """
        Constructs the ERNIE-M model.

        Args:
            self: The object instance.
            input_ids (Optional[mindspore.Tensor]): The input tensor of token indices. Default is None.
            position_ids (Optional[mindspore.Tensor]): The tensor indicating the position of tokens. Default is None.
            attention_mask (Optional[mindspore.Tensor]):
                The tensor indicating which elements in the input do not need to be attended to. Default is None.
            head_mask (Optional[mindspore.Tensor]):
                The tensor indicating the heads in the multi-head attention layer to be masked. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]): The input embeddings. Default is None.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The previous key values. Default is None.
            use_cache (Optional[bool]): Whether to use the cache. Default is None.
            output_hidden_states (Optional[bool]): Whether to output the hidden states. Default is None.
            output_attentions (Optional[bool]): Whether to output the attentions. Default is None.
            return_dict (Optional[bool]): Whether to return a dictionary. Default is None.

        Returns:
            Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
                Depending on the value of `return_dict`, returns a tuple of tensors including the last hidden state and
                the pooler output, or a BaseModelOutputWithPoolingAndCrossAttentions object.

        Raises:
            ValueError: If both `input_ids` and `inputs_embeds` are specified.
        """
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time.")

        # init the default bool value
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.return_dict

        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        past_key_values_length = 0
        if past_key_values is not None:
            past_key_values_length = past_key_values[0][0].shape[2]

        # Adapted from paddlenlp.transformers.ernie_m.ErnieMModel
        if attention_mask is None:
            attention_mask = (input_ids == 0).to(self.dtype)
            attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)
            if past_key_values is not None:
                batch_size = past_key_values[0][0].shape[0]
                past_mask = ops.zeros([batch_size, 1, 1, past_key_values_length], dtype=attention_mask.dtype)
                attention_mask = ops.concat([past_mask, attention_mask], dim=-1)
        # For 2D attention_mask from tokenizer
        elif attention_mask.ndim == 2:
            attention_mask = attention_mask.to(self.dtype)
            attention_mask = 1.0 - attention_mask
            attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)

        extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(1)

        embedding_output = self.embeddings(
            input_ids=input_ids,
            position_ids=position_ids,
            inputs_embeds=inputs_embeds,
            past_key_values_length=past_key_values_length,
        )
        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            past_key_values=past_key_values,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        if not return_dict:
            sequence_output = encoder_outputs[0]
            pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
            return (sequence_output, pooler_output) + encoder_outputs[1:]

        sequence_output = encoder_outputs["last_hidden_state"]
        pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
        hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"]
        attentions = None if not output_attentions else encoder_outputs["attentions"]

        return BaseModelOutputWithPoolingAndCrossAttentions(
            last_hidden_state=sequence_output,
            pooler_output=pooler_output,
            hidden_states=hidden_states,
            attentions=attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.__init__(config, add_pooling_layer=True)

Initializes the ErnieMModel.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object containing model settings.

TYPE: object

add_pooling_layer

A flag indicating whether to add a pooling layer to the model.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
def __init__(self, config, add_pooling_layer=True):
    """
    Initializes the ErnieMModel.

    Args:
        self: The instance of the class.
        config (object): The configuration object containing model settings.
        add_pooling_layer (bool): A flag indicating whether to add a pooling layer to the model.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.initializer_range = config.initializer_range
    self.embeddings = ErnieMEmbeddings(config)
    self.encoder = ErnieMEncoder(config)
    self.pooler = ErnieMPooler(config) if add_pooling_layer else None
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.forward(input_ids=None, position_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, past_key_values=None, use_cache=None, output_hidden_states=None, output_attentions=None, return_dict=None)

Constructs the ERNIE-M model.

PARAMETER DESCRIPTION
self

The object instance.

input_ids

The input tensor of token indices. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The tensor indicating the position of tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The tensor indicating which elements in the input do not need to be attended to. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The tensor indicating the heads in the multi-head attention layer to be masked. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The previous key values. Default is None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

use_cache

Whether to use the cache. Default is None.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output the hidden states. Default is None.

TYPE: Optional[bool] DEFAULT: None

output_attentions

Whether to output the attentions. Default is None.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a dictionary. Default is None.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutputWithPoolingAndCrossAttentions]

Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]: Depending on the value of return_dict, returns a tuple of tensors including the last hidden state and the pooler output, or a BaseModelOutputWithPoolingAndCrossAttentions object.

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
    """
    Constructs the ERNIE-M model.

    Args:
        self: The object instance.
        input_ids (Optional[mindspore.Tensor]): The input tensor of token indices. Default is None.
        position_ids (Optional[mindspore.Tensor]): The tensor indicating the position of tokens. Default is None.
        attention_mask (Optional[mindspore.Tensor]):
            The tensor indicating which elements in the input do not need to be attended to. Default is None.
        head_mask (Optional[mindspore.Tensor]):
            The tensor indicating the heads in the multi-head attention layer to be masked. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]): The input embeddings. Default is None.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The previous key values. Default is None.
        use_cache (Optional[bool]): Whether to use the cache. Default is None.
        output_hidden_states (Optional[bool]): Whether to output the hidden states. Default is None.
        output_attentions (Optional[bool]): Whether to output the attentions. Default is None.
        return_dict (Optional[bool]): Whether to return a dictionary. Default is None.

    Returns:
        Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
            Depending on the value of `return_dict`, returns a tuple of tensors including the last hidden state and
            the pooler output, or a BaseModelOutputWithPoolingAndCrossAttentions object.

    Raises:
        ValueError: If both `input_ids` and `inputs_embeds` are specified.
    """
    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time.")

    # init the default bool value
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.return_dict

    head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

    past_key_values_length = 0
    if past_key_values is not None:
        past_key_values_length = past_key_values[0][0].shape[2]

    # Adapted from paddlenlp.transformers.ernie_m.ErnieMModel
    if attention_mask is None:
        attention_mask = (input_ids == 0).to(self.dtype)
        attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)
        if past_key_values is not None:
            batch_size = past_key_values[0][0].shape[0]
            past_mask = ops.zeros([batch_size, 1, 1, past_key_values_length], dtype=attention_mask.dtype)
            attention_mask = ops.concat([past_mask, attention_mask], dim=-1)
    # For 2D attention_mask from tokenizer
    elif attention_mask.ndim == 2:
        attention_mask = attention_mask.to(self.dtype)
        attention_mask = 1.0 - attention_mask
        attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)

    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(1)

    embedding_output = self.embeddings(
        input_ids=input_ids,
        position_ids=position_ids,
        inputs_embeds=inputs_embeds,
        past_key_values_length=past_key_values_length,
    )
    encoder_outputs = self.encoder(
        embedding_output,
        attention_mask=extended_attention_mask,
        head_mask=head_mask,
        past_key_values=past_key_values,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    if not return_dict:
        sequence_output = encoder_outputs[0]
        pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
        return (sequence_output, pooler_output) + encoder_outputs[1:]

    sequence_output = encoder_outputs["last_hidden_state"]
    pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
    hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"]
    attentions = None if not output_attentions else encoder_outputs["attentions"]

    return BaseModelOutputWithPoolingAndCrossAttentions(
        last_hidden_state=sequence_output,
        pooler_output=pooler_output,
        hidden_states=hidden_states,
        attentions=attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.get_input_embeddings()

This method returns the input embeddings from the ErnieMModel.

PARAMETER DESCRIPTION
self

ErnieMModel object. The instance of the ErnieMModel class.

RETURNS DESCRIPTION
word_embeddings

The method returns the input embeddings from the ErnieMModel.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
833
834
835
836
837
838
839
840
841
842
843
844
845
846
def get_input_embeddings(self):
    """
    This method returns the input embeddings from the ErnieMModel.

    Args:
        self: ErnieMModel object. The instance of the ErnieMModel class.

    Returns:
        word_embeddings: The method returns the input embeddings from the ErnieMModel.

    Raises:
        None.
    """
    return self.embeddings.word_embeddings

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.set_input_embeddings(value)

Set the input embeddings for the ErnieMModel.

PARAMETER DESCRIPTION
self

The instance of the ErnieMModel class.

TYPE: ErnieMModel

value

The input embeddings value to be set. It should be a tensor representing the input embeddings.

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
def set_input_embeddings(self, value):
    """
    Set the input embeddings for the ErnieMModel.

    Args:
        self (ErnieMModel): The instance of the ErnieMModel class.
        value: The input embeddings value to be set. It should be a tensor representing the input embeddings.

    Returns:
        None.

    Raises:
        None.
    """
    self.embeddings.word_embeddings = value

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPooler

Bases: Module

This class represents the MPooler module of the ERNIE model, which is responsible for pooling the hidden states to obtain a single representation of the input sequence.

Inherits from

nn.Module

ATTRIBUTE DESCRIPTION
dense

A fully connected layer that projects the input hidden states to a new hidden size.

TYPE: Linear

activation

The activation function applied to the projected hidden states.

TYPE: Tanh

METHOD DESCRIPTION
__init__

Initializes the ERNIE MPooler module.

forward

Constructs the MPooler module by pooling the hidden states.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
class ErnieMPooler(nn.Module):
    """
    This class represents the MPooler module of the ERNIE model, which is responsible for pooling the hidden states to
    obtain a single representation of the input sequence.

    Inherits from:
        nn.Module

    Attributes:
        dense (nn.Linear): A fully connected layer that projects the input hidden states to a new hidden size.
        activation (nn.Tanh): The activation function applied to the projected hidden states.

    Methods:
        __init__(config): Initializes the ERNIE MPooler module.
        forward(hidden_states): Constructs the MPooler module by pooling the hidden states.

    """
    def __init__(self, config):
        """
        Initializes a new instance of the ErnieMPooler class.

        Args:
            self: The object instance.
            config: An instance of the configuration class used to configure the ErnieMPooler.
                It provides various settings and parameters for the ErnieMPooler's behavior. This parameter is required.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the pooled output tensor for the ERNIE model.

        Args:
            self (ErnieMPooler): An instance of the ErnieMPooler class.
            hidden_states (mindspore.Tensor): A tensor containing the hidden states from the ERNIE model.
                It should have the shape (batch_size, sequence_length, hidden_size) where:

                - batch_size: The number of sequences in the batch.
                - sequence_length: The length of each input sequence.
                - hidden_size: The size of the hidden state vectors.

        Returns:
            mindspore.Tensor: A tensor representing the pooled output of the ERNIE model.
                The pooled output is obtained by applying dense and activation layers to the first token tensor
                extracted from the hidden states tensor.

        Raises:
            None
        """
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPooler.__init__(config)

Initializes a new instance of the ErnieMPooler class.

PARAMETER DESCRIPTION
self

The object instance.

config

An instance of the configuration class used to configure the ErnieMPooler. It provides various settings and parameters for the ErnieMPooler's behavior. This parameter is required.

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
def __init__(self, config):
    """
    Initializes a new instance of the ErnieMPooler class.

    Args:
        self: The object instance.
        config: An instance of the configuration class used to configure the ErnieMPooler.
            It provides various settings and parameters for the ErnieMPooler's behavior. This parameter is required.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.dense = nn.Linear(config.hidden_size, config.hidden_size)
    self.activation = nn.Tanh()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPooler.forward(hidden_states)

Constructs the pooled output tensor for the ERNIE model.

PARAMETER DESCRIPTION
self

An instance of the ErnieMPooler class.

TYPE: ErnieMPooler

hidden_states

A tensor containing the hidden states from the ERNIE model. It should have the shape (batch_size, sequence_length, hidden_size) where:

  • batch_size: The number of sequences in the batch.
  • sequence_length: The length of each input sequence.
  • hidden_size: The size of the hidden state vectors.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the pooled output of the ERNIE model. The pooled output is obtained by applying dense and activation layers to the first token tensor extracted from the hidden states tensor.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the pooled output tensor for the ERNIE model.

    Args:
        self (ErnieMPooler): An instance of the ErnieMPooler class.
        hidden_states (mindspore.Tensor): A tensor containing the hidden states from the ERNIE model.
            It should have the shape (batch_size, sequence_length, hidden_size) where:

            - batch_size: The number of sequences in the batch.
            - sequence_length: The length of each input sequence.
            - hidden_size: The size of the hidden state vectors.

    Returns:
        mindspore.Tensor: A tensor representing the pooled output of the ERNIE model.
            The pooled output is obtained by applying dense and activation layers to the first token tensor
            extracted from the hidden states tensor.

    Raises:
        None
    """
    # We "pool" the model by simply taking the hidden state corresponding
    # to the first token.
    first_token_tensor = hidden_states[:, 0]
    pooled_output = self.dense(first_token_tensor)
    pooled_output = self.activation(pooled_output)
    return pooled_output

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
class ErnieMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = ErnieMConfig
    base_model_prefix = "ernie_m"

    def _init_weights(self, cell):
        """Initialize the weights"""
        if isinstance(cell, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            cell.weight.set_data(initializer(Normal(self.config.initializer_range),
                                                    cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(0.0, self.config.initializer_range, cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention

Bases: Module

A module that implements the self-attention mechanism used in ERNIE model.

This module contains the ErnieMSelfAttention class, which represents the self-attention mechanism used in the ERNIE model. It is a subclass of nn.Module and is responsible for calculating the attention scores and producing the context layer.

ATTRIBUTE DESCRIPTION
num_attention_heads

The number of attention heads in the self-attention mechanism.

TYPE: int

attention_head_size

The size of each attention head.

TYPE: int

all_head_size

The total size of all attention heads combined.

TYPE: int

q_proj

The projection layer for the query tensor.

TYPE: Linear

k_proj

The projection layer for the key tensor.

TYPE: Linear

v_proj

The projection layer for the value tensor.

TYPE: Linear

dropout

The dropout layer applied to the attention probabilities.

TYPE: Dropout

position_embedding_type

The type of position embedding used in the attention mechanism.

TYPE: str

distance_embedding

The embedding layer for computing relative positions in the attention scores.

TYPE: Embedding

is_decoder

Whether the self-attention mechanism is used in a decoder module.

TYPE: bool

METHOD DESCRIPTION
transpose_for_scores

Reshapes the input tensor for calculating attention scores.

forward

Constructs the self-attention mechanism by calculating attention scores and producing the context layer.

Example
>>> config = ErnieConfig(hidden_size=768, num_attention_heads=12, attention_probs_dropout_prob=0.1)
>>> self_attention = ErnieMSelfAttention(config)
Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
class ErnieMSelfAttention(nn.Module):
    """
    A module that implements the self-attention mechanism used in ERNIE model.

    This module contains the `ErnieMSelfAttention` class, which represents the self-attention mechanism used in the
    ERNIE model. It is a subclass of `nn.Module` and is responsible for calculating the attention scores and producing
    the context layer.

    Attributes:
        num_attention_heads (int): The number of attention heads in the self-attention mechanism.
        attention_head_size (int): The size of each attention head.
        all_head_size (int): The total size of all attention heads combined.
        q_proj (nn.Linear): The projection layer for the query tensor.
        k_proj (nn.Linear): The projection layer for the key tensor.
        v_proj (nn.Linear): The projection layer for the value tensor.
        dropout (nn.Dropout): The dropout layer applied to the attention probabilities.
        position_embedding_type (str): The type of position embedding used in the attention mechanism.
        distance_embedding (nn.Embedding): The embedding layer for computing relative positions in the attention scores.
        is_decoder (bool): Whether the self-attention mechanism is used in a decoder module.

    Methods:
        transpose_for_scores:
            Reshapes the input tensor for calculating attention scores.

        forward:
            Constructs the self-attention mechanism by calculating attention scores and producing the context layer.

    Example:
        ```python
        >>> config = ErnieConfig(hidden_size=768, num_attention_heads=12, attention_probs_dropout_prob=0.1)
        >>> self_attention = ErnieMSelfAttention(config)
        ```
        """
    def __init__(self, config, position_embedding_type=None):
        """
        Initializes the ErnieMSelfAttention class.

        Args:
            self: The object itself.
            config (object): An object containing configuration parameters for the self-attention mechanism.
            position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

        Returns:
            None.

        Raises:
            ValueError: If the hidden size is not a multiple of the number of attention heads.
        """
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
            raise ValueError(
                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
                f"heads ({config.num_attention_heads})"
            )

        self.num_attention_heads = config.num_attention_heads
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.q_proj = nn.Linear(config.hidden_size, self.all_head_size)
        self.k_proj = nn.Linear(config.hidden_size, self.all_head_size)
        self.v_proj = nn.Linear(config.hidden_size, self.all_head_size)

        self.dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
        self.position_embedding_type = position_embedding_type or getattr(
            config, "position_embedding_type", "absolute"
        )
        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            self.max_position_embeddings = config.max_position_embeddings
            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

        self.is_decoder = config.is_decoder

    def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
        """
        Transposes the input tensor for calculating attention scores in the ErnieMSelfAttention class.

        Args:
            self (ErnieMSelfAttention): The instance of the ErnieMSelfAttention class.
            x (mindspore.Tensor): The input tensor to be transposed.
                It should have a shape of (batch_size, sequence_length, hidden_size).

        Returns:
            mindspore.Tensor:
                The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).

        Raises:
            None.
        """
        new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        This method forwards the self-attention mechanism for the ErnieMSelfAttention class.

        Args:
            self: The instance of the class.
            hidden_states (mindspore.Tensor): The input tensor representing the hidden states.
            attention_mask (Optional[mindspore.Tensor]):
                Optional tensor for masking attention scores. Defaults to None.
            head_mask (Optional[mindspore.Tensor]): Optional tensor for masking attention heads. Defaults to None.
            encoder_hidden_states (Optional[mindspore.Tensor]):
                Optional tensor representing hidden states from an encoder. Defaults to None.
            encoder_attention_mask (Optional[mindspore.Tensor]):
                Optional tensor for masking encoder attention scores. Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]):
                Optional tuple of past key and value tensors. Defaults to None.
            output_attentions (Optional[bool]):
                Flag indicating whether to output attentions. Defaults to False.

        Returns:
            Tuple[mindspore.Tensor]:
                A tuple containing the context layer tensor and optionally the attention probabilities tensor.

        Raises:
            ValueError: If the input tensor shapes are incompatible for matrix multiplication.
            ValueError: If the position_embedding_type specified is not supported.
            RuntimeError: If there is an issue with applying softmax or dropout operations.
            RuntimeError: If there is an issue with reshaping the context layer tensor.
        """
        mixed_query_layer = self.q_proj(hidden_states)

        # If this is instantiated as a cross-attention module, the keys
        # and values come from an encoder; the attention mask needs to be
        # such that the encoder's padding tokens are not attended to.
        is_cross_attention = encoder_hidden_states is not None

        if is_cross_attention and past_key_value is not None:
            # reuse k,v, cross_attentions
            key_layer = past_key_value[0]
            value_layer = past_key_value[1]
            attention_mask = encoder_attention_mask
        elif is_cross_attention:
            key_layer = self.transpose_for_scores(self.k_proj(encoder_hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(encoder_hidden_states))
            attention_mask = encoder_attention_mask
        elif past_key_value is not None:
            key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(hidden_states))
            key_layer = ops.cat([past_key_value[0], key_layer], dim=2)
            value_layer = ops.cat([past_key_value[1], value_layer], dim=2)
        else:
            key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(hidden_states))

        query_layer = self.transpose_for_scores(mixed_query_layer)

        use_cache = past_key_value is not None
        if self.is_decoder:
            # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
            # Further calls to cross_attention layer can then reuse all cross-attention
            # key/value_states (first "if" case)
            # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
            # all previous decoder key/value_states. Further calls to uni-directional self-attention
            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
            # if encoder bi-directional self-attention `past_key_value` is always `None`
            past_key_value = (key_layer, value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))

        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            query_length, key_length = query_layer.shape[2], key_layer.shape[2]
            if use_cache:
                position_ids_l = mindspore.tensor(key_length - 1, dtype=mindspore.int64).view(
                    -1, 1
                )
            else:
                position_ids_l = ops.arange(query_length, dtype=mindspore.int64).view(-1, 1)
            position_ids_r = ops.arange(key_length, dtype=mindspore.int64).view(1, -1)
            distance = position_ids_l - position_ids_r

            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

            if self.position_embedding_type == "relative_key":
                relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores
            elif self.position_embedding_type == "relative_key_query":
                relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in ErnieMModel forward() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = ops.softmax(attention_scores, dim=-1)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = ops.matmul(attention_probs, value_layer)

        context_layer = context_layer.permute(0, 2, 1, 3)
        new_context_layer_shape = context_layer.shape[:-2] + (self.all_head_size,)
        context_layer = context_layer.view(new_context_layer_shape)

        outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

        if self.is_decoder:
            outputs = outputs + (past_key_value,)
        return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention.__init__(config, position_embedding_type=None)

Initializes the ErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The object itself.

config

An object containing configuration parameters for the self-attention mechanism.

TYPE: object

position_embedding_type

The type of position embedding to use. Defaults to None.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the hidden size is not a multiple of the number of attention heads.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
def __init__(self, config, position_embedding_type=None):
    """
    Initializes the ErnieMSelfAttention class.

    Args:
        self: The object itself.
        config (object): An object containing configuration parameters for the self-attention mechanism.
        position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

    Returns:
        None.

    Raises:
        ValueError: If the hidden size is not a multiple of the number of attention heads.
    """
    super().__init__()
    if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
        raise ValueError(
            f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
            f"heads ({config.num_attention_heads})"
        )

    self.num_attention_heads = config.num_attention_heads
    self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
    self.all_head_size = self.num_attention_heads * self.attention_head_size

    self.q_proj = nn.Linear(config.hidden_size, self.all_head_size)
    self.k_proj = nn.Linear(config.hidden_size, self.all_head_size)
    self.v_proj = nn.Linear(config.hidden_size, self.all_head_size)

    self.dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
    self.position_embedding_type = position_embedding_type or getattr(
        config, "position_embedding_type", "absolute"
    )
    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        self.max_position_embeddings = config.max_position_embeddings
        self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

    self.is_decoder = config.is_decoder

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

This method forwards the self-attention mechanism for the ErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_states

The input tensor representing the hidden states.

TYPE: Tensor

attention_mask

Optional tensor for masking attention scores. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

Optional tensor for masking attention heads. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

Optional tensor representing hidden states from an encoder. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

Optional tensor for masking encoder attention scores. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

Optional tuple of past key and value tensors. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Flag indicating whether to output attentions. Defaults to False.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the context layer tensor and optionally the attention probabilities tensor.

RAISES DESCRIPTION
ValueError

If the input tensor shapes are incompatible for matrix multiplication.

ValueError

If the position_embedding_type specified is not supported.

RuntimeError

If there is an issue with applying softmax or dropout operations.

RuntimeError

If there is an issue with reshaping the context layer tensor.

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    This method forwards the self-attention mechanism for the ErnieMSelfAttention class.

    Args:
        self: The instance of the class.
        hidden_states (mindspore.Tensor): The input tensor representing the hidden states.
        attention_mask (Optional[mindspore.Tensor]):
            Optional tensor for masking attention scores. Defaults to None.
        head_mask (Optional[mindspore.Tensor]): Optional tensor for masking attention heads. Defaults to None.
        encoder_hidden_states (Optional[mindspore.Tensor]):
            Optional tensor representing hidden states from an encoder. Defaults to None.
        encoder_attention_mask (Optional[mindspore.Tensor]):
            Optional tensor for masking encoder attention scores. Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]):
            Optional tuple of past key and value tensors. Defaults to None.
        output_attentions (Optional[bool]):
            Flag indicating whether to output attentions. Defaults to False.

    Returns:
        Tuple[mindspore.Tensor]:
            A tuple containing the context layer tensor and optionally the attention probabilities tensor.

    Raises:
        ValueError: If the input tensor shapes are incompatible for matrix multiplication.
        ValueError: If the position_embedding_type specified is not supported.
        RuntimeError: If there is an issue with applying softmax or dropout operations.
        RuntimeError: If there is an issue with reshaping the context layer tensor.
    """
    mixed_query_layer = self.q_proj(hidden_states)

    # If this is instantiated as a cross-attention module, the keys
    # and values come from an encoder; the attention mask needs to be
    # such that the encoder's padding tokens are not attended to.
    is_cross_attention = encoder_hidden_states is not None

    if is_cross_attention and past_key_value is not None:
        # reuse k,v, cross_attentions
        key_layer = past_key_value[0]
        value_layer = past_key_value[1]
        attention_mask = encoder_attention_mask
    elif is_cross_attention:
        key_layer = self.transpose_for_scores(self.k_proj(encoder_hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(encoder_hidden_states))
        attention_mask = encoder_attention_mask
    elif past_key_value is not None:
        key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(hidden_states))
        key_layer = ops.cat([past_key_value[0], key_layer], dim=2)
        value_layer = ops.cat([past_key_value[1], value_layer], dim=2)
    else:
        key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(hidden_states))

    query_layer = self.transpose_for_scores(mixed_query_layer)

    use_cache = past_key_value is not None
    if self.is_decoder:
        # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
        # Further calls to cross_attention layer can then reuse all cross-attention
        # key/value_states (first "if" case)
        # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
        # all previous decoder key/value_states. Further calls to uni-directional self-attention
        # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
        # if encoder bi-directional self-attention `past_key_value` is always `None`
        past_key_value = (key_layer, value_layer)

    # Take the dot product between "query" and "key" to get the raw attention scores.
    attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))

    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        query_length, key_length = query_layer.shape[2], key_layer.shape[2]
        if use_cache:
            position_ids_l = mindspore.tensor(key_length - 1, dtype=mindspore.int64).view(
                -1, 1
            )
        else:
            position_ids_l = ops.arange(query_length, dtype=mindspore.int64).view(-1, 1)
        position_ids_r = ops.arange(key_length, dtype=mindspore.int64).view(1, -1)
        distance = position_ids_l - position_ids_r

        positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
        positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

        if self.position_embedding_type == "relative_key":
            relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores
        elif self.position_embedding_type == "relative_key_query":
            relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

    attention_scores = attention_scores / math.sqrt(self.attention_head_size)
    if attention_mask is not None:
        # Apply the attention mask is (precomputed for all layers in ErnieMModel forward() function)
        attention_scores = attention_scores + attention_mask

    # Normalize the attention scores to probabilities.
    attention_probs = ops.softmax(attention_scores, dim=-1)

    # This is actually dropping out entire tokens to attend to, which might
    # seem a bit unusual, but is taken from the original Transformer paper.
    attention_probs = self.dropout(attention_probs)

    # Mask heads if we want to
    if head_mask is not None:
        attention_probs = attention_probs * head_mask

    context_layer = ops.matmul(attention_probs, value_layer)

    context_layer = context_layer.permute(0, 2, 1, 3)
    new_context_layer_shape = context_layer.shape[:-2] + (self.all_head_size,)
    context_layer = context_layer.view(new_context_layer_shape)

    outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

    if self.is_decoder:
        outputs = outputs + (past_key_value,)
    return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention.transpose_for_scores(x)

Transposes the input tensor for calculating attention scores in the ErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMSelfAttention class.

TYPE: ErnieMSelfAttention

x

The input tensor to be transposed. It should have a shape of (batch_size, sequence_length, hidden_size).

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
    """
    Transposes the input tensor for calculating attention scores in the ErnieMSelfAttention class.

    Args:
        self (ErnieMSelfAttention): The instance of the ErnieMSelfAttention class.
        x (mindspore.Tensor): The input tensor to be transposed.
            It should have a shape of (batch_size, sequence_length, hidden_size).

    Returns:
        mindspore.Tensor:
            The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).

    Raises:
        None.
    """
    new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
    x = x.view(new_x_shape)
    return x.permute(0, 2, 1, 3)

mindnlp.transformers.models.ernie_m.modeling_ernie_m.UIEM

Bases: ErnieMForInformationExtraction

UIEM model

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
class UIEM(ErnieMForInformationExtraction):
    """UIEM model"""
    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
                not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
                taken into account for computing the loss.
        """
        result = self.ernie_m(
            input_ids,
            # attention_mask=attention_mask,
            position_ids=position_ids,
            # head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        if return_dict:
            sequence_output = result.last_hidden_state
        elif not return_dict:
            sequence_output = result[0]

        start_logits = self.linear_start(sequence_output)
        start_logits = start_logits.squeeze(-1)
        start_prob = self.sigmoid(start_logits)
        end_logits = self.linear_end(sequence_output)
        end_logits = end_logits.squeeze(-1)
        end_prob = self.sigmoid(end_logits)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = F.binary_cross_entropy_with_logits(start_prob, start_positions)
            end_loss = F.binary_cross_entropy_with_logits(end_prob, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            return tuple(
                i
                for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
                if i is not None
            )

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_prob,
            end_logits=end_prob,
            hidden_states=result.hidden_states,
            attentions=result.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.UIEM.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) for computing the start_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

end_positions

Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_ernie_m.py
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
            not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
            taken into account for computing the loss.
    """
    result = self.ernie_m(
        input_ids,
        # attention_mask=attention_mask,
        position_ids=position_ids,
        # head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    if return_dict:
        sequence_output = result.last_hidden_state
    elif not return_dict:
        sequence_output = result[0]

    start_logits = self.linear_start(sequence_output)
    start_logits = start_logits.squeeze(-1)
    start_prob = self.sigmoid(start_logits)
    end_logits = self.linear_end(sequence_output)
    end_logits = end_logits.squeeze(-1)
    end_prob = self.sigmoid(end_logits)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = F.binary_cross_entropy_with_logits(start_prob, start_positions)
        end_loss = F.binary_cross_entropy_with_logits(end_prob, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        return tuple(
            i
            for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
            if i is not None
        )

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_prob,
        end_logits=end_prob,
        hidden_states=result.hidden_states,
        attentions=result.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m

MindSpore ErnieM model.

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention

Bases: Module

This class represents an attention module for MSErnieM model, which includes self-attention mechanism and projection layers. It inherits from nn.Module and provides methods to initialize the attention module, prune attention heads, and perform attention computation. The attention module consists of self-attention mechanism with configurable position embedding type and projection layers for output transformation. The 'prune_heads' method allows pruning specific attention heads based on provided indices. The 'forward' method computes the attention output given input hidden states, optional masks, and other optional inputs.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
class MSErnieMAttention(nn.Module):

    """
    This class represents an attention module for MSErnieM model, which includes self-attention mechanism and projection
    layers.
    It inherits from nn.Module and provides methods to initialize the attention module, prune attention heads, and perform
    attention computation.
    The attention module consists of self-attention mechanism with configurable position embedding type and projection
    layers for output transformation.
    The 'prune_heads' method allows pruning specific attention heads based on provided indices.
    The 'forward' method computes the attention output given input hidden states, optional masks, and other optional
    inputs.
    """
    def __init__(self, config, position_embedding_type=None):
        """
        Initializes an instance of the MSErnieMAttention class.

        Args:
            self: The instance of the class.
            config (object): An object that contains the configuration settings for the attention layer.
            position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.self_attn = MSErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
        self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        """
        This method 'prune_heads' in the class 'MSErnieMAttention' prunes heads from the attention mechanism.

        Args:
            self (object): The instance of the class.
            heads (list): A list of integers representing the indices of heads to be pruned from the attention mechanism.

        Returns:
            None: This method does not return anything explicitly, as it operates by mutating the internal state of the class.

        Raises:
            ValueError: If the length of the 'heads' list is equal to 0.
            TypeError: If the 'heads' parameter is not a list of integers.
            IndexError: If the indices in 'heads' exceed the available attention heads in the mechanism.
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
        )

        # Prune linear layers
        self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
        self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
        self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
        self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

        # Update hyper params and store pruned heads
        self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
        self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        Constructs the MSErnieMAttention module.

        Args:
            self (MSErnieMAttention): The instance of the MSErnieMAttention class.
            hidden_states (mindspore.Tensor): The input hidden states of the model.
                Shape: (batch_size, seq_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor], optional):
                The attention mask tensor, indicating which tokens should be attended to and which should not.
                Shape: (batch_size, seq_length). Defaults to None.
            head_mask (Optional[mindspore.Tensor], optional):
                The head mask tensor, indicating which heads should be masked out.
                Shape: (num_heads, seq_length, seq_length). Defaults to None.
            encoder_hidden_states (Optional[mindspore.Tensor], optional):
                The hidden states of the encoder. Shape: (batch_size, seq_length, hidden_size). Defaults to None.
            encoder_attention_mask (Optional[mindspore.Tensor], optional):
                The attention mask tensor for the encoder, indicating which tokens should be attended to and which
                should not. Shape: (batch_size, seq_length). Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]], optional):
                The tuple of past key and value tensors for keeping the previous attention weights.
                Shape: ((batch_size, num_heads, seq_length, hidden_size),
                (batch_size, num_heads, seq_length, hidden_size)). Defaults to None.
            output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the attention output tensor and other optional outputs.

        Raises:
            None.
        """
        self_outputs = self.self_attn(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states,
            encoder_attention_mask,
            past_key_value,
            output_attentions,
        )
        attention_output = self.out_proj(self_outputs[0])
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention.__init__(config, position_embedding_type=None)

Initializes an instance of the MSErnieMAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object that contains the configuration settings for the attention layer.

TYPE: object

position_embedding_type

The type of position embedding to use. Defaults to None.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION

None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
def __init__(self, config, position_embedding_type=None):
    """
    Initializes an instance of the MSErnieMAttention class.

    Args:
        self: The instance of the class.
        config (object): An object that contains the configuration settings for the attention layer.
        position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.self_attn = MSErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
    self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
    self.pruned_heads = set()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

Constructs the MSErnieMAttention module.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMAttention class.

TYPE: MSErnieMAttention

hidden_states

The input hidden states of the model. Shape: (batch_size, seq_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask tensor, indicating which tokens should be attended to and which should not. Shape: (batch_size, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor, indicating which heads should be masked out. Shape: (num_heads, seq_length, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

The hidden states of the encoder. Shape: (batch_size, seq_length, hidden_size). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

The attention mask tensor for the encoder, indicating which tokens should be attended to and which should not. Shape: (batch_size, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

The tuple of past key and value tensors for keeping the previous attention weights. Shape: ((batch_size, num_heads, seq_length, hidden_size), (batch_size, num_heads, seq_length, hidden_size)). Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to output attention weights. Defaults to False.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the attention output tensor and other optional outputs.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    Constructs the MSErnieMAttention module.

    Args:
        self (MSErnieMAttention): The instance of the MSErnieMAttention class.
        hidden_states (mindspore.Tensor): The input hidden states of the model.
            Shape: (batch_size, seq_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor], optional):
            The attention mask tensor, indicating which tokens should be attended to and which should not.
            Shape: (batch_size, seq_length). Defaults to None.
        head_mask (Optional[mindspore.Tensor], optional):
            The head mask tensor, indicating which heads should be masked out.
            Shape: (num_heads, seq_length, seq_length). Defaults to None.
        encoder_hidden_states (Optional[mindspore.Tensor], optional):
            The hidden states of the encoder. Shape: (batch_size, seq_length, hidden_size). Defaults to None.
        encoder_attention_mask (Optional[mindspore.Tensor], optional):
            The attention mask tensor for the encoder, indicating which tokens should be attended to and which
            should not. Shape: (batch_size, seq_length). Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]], optional):
            The tuple of past key and value tensors for keeping the previous attention weights.
            Shape: ((batch_size, num_heads, seq_length, hidden_size),
            (batch_size, num_heads, seq_length, hidden_size)). Defaults to None.
        output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the attention output tensor and other optional outputs.

    Raises:
        None.
    """
    self_outputs = self.self_attn(
        hidden_states,
        attention_mask,
        head_mask,
        encoder_hidden_states,
        encoder_attention_mask,
        past_key_value,
        output_attentions,
    )
    attention_output = self.out_proj(self_outputs[0])
    outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
    return outputs

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention.prune_heads(heads)

This method 'prune_heads' in the class 'MSErnieMAttention' prunes heads from the attention mechanism.

PARAMETER DESCRIPTION
self

The instance of the class.

TYPE: object

heads

A list of integers representing the indices of heads to be pruned from the attention mechanism.

TYPE: list

RETURNS DESCRIPTION
None

This method does not return anything explicitly, as it operates by mutating the internal state of the class.

RAISES DESCRIPTION
ValueError

If the length of the 'heads' list is equal to 0.

TypeError

If the 'heads' parameter is not a list of integers.

IndexError

If the indices in 'heads' exceed the available attention heads in the mechanism.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
def prune_heads(self, heads):
    """
    This method 'prune_heads' in the class 'MSErnieMAttention' prunes heads from the attention mechanism.

    Args:
        self (object): The instance of the class.
        heads (list): A list of integers representing the indices of heads to be pruned from the attention mechanism.

    Returns:
        None: This method does not return anything explicitly, as it operates by mutating the internal state of the class.

    Raises:
        ValueError: If the length of the 'heads' list is equal to 0.
        TypeError: If the 'heads' parameter is not a list of integers.
        IndexError: If the indices in 'heads' exceed the available attention heads in the mechanism.
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
    )

    # Prune linear layers
    self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
    self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
    self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
    self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

    # Update hyper params and store pruned heads
    self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
    self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEmbeddings

Bases: Module

Construct the embeddings from word and position embeddings.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
class MSErnieMEmbeddings(nn.Module):
    """Construct the embeddings from word and position embeddings."""
    def __init__(self, config):
        """
        Initializes an instance of the MSErnieMEmbeddings class.

        Args:
            self: The object instance.
            config (object):
                A configuration object containing various parameters.

                - hidden_size (int): The size of the hidden state.
                - vocab_size (int): The size of the vocabulary.
                - pad_token_id (int): The ID of the padding token.
                - max_position_embeddings (int): The maximum number of positional embeddings.
                - layer_norm_eps (float): The epsilon value for layer normalization.
                - hidden_dropout_prob (float): The dropout probability for the hidden state.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.hidden_size = config.hidden_size
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(
            config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
        )
        self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
        self.padding_idx = config.pad_token_id

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values_length: int = 0,
    ) -> mindspore.Tensor:
        """
        Constructs the embeddings for MSErnieM model.

        Args:
            self (MSErnieMEmbeddings): The MSErnieMEmbeddings instance.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor containing the indices of input tokens. Default is None.
            position_ids (Optional[mindspore.Tensor]):
                The input tensor containing the indices of position tokens. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]):
                The input tensor containing the embeddings of input tokens. Default is None.
            past_key_values_length (int): The length of past key values. Default is 0.

        Returns:
            mindspore.Tensor: The forwarded embeddings tensor.

        Raises:
            ValueError: If the input_ids and inputs_embeds are both None.
            ValueError: If the input_shape is invalid for position_ids calculation.
            ValueError: If past_key_values_length is negative.
        """
        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        if position_ids is None:
            input_shape = inputs_embeds.shape[:-1]
            ones = ops.ones(input_shape, dtype=mindspore.int64)
            seq_length = ops.cumsum(ones, dim=1)
            position_ids = seq_length - ones

            if past_key_values_length > 0:
                position_ids = position_ids + past_key_values_length
        # to mimic paddlenlp implementation
        position_ids += 2
        position_embeddings = self.position_embeddings(position_ids)
        embeddings = inputs_embeds + position_embeddings
        embeddings = self.layer_norm(embeddings)
        embeddings = self.dropout(embeddings)

        return embeddings

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEmbeddings.__init__(config)

Initializes an instance of the MSErnieMEmbeddings class.

PARAMETER DESCRIPTION
self

The object instance.

config

A configuration object containing various parameters.

  • hidden_size (int): The size of the hidden state.
  • vocab_size (int): The size of the vocabulary.
  • pad_token_id (int): The ID of the padding token.
  • max_position_embeddings (int): The maximum number of positional embeddings.
  • layer_norm_eps (float): The epsilon value for layer normalization.
  • hidden_dropout_prob (float): The dropout probability for the hidden state.

TYPE: object

RETURNS DESCRIPTION

None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def __init__(self, config):
    """
    Initializes an instance of the MSErnieMEmbeddings class.

    Args:
        self: The object instance.
        config (object):
            A configuration object containing various parameters.

            - hidden_size (int): The size of the hidden state.
            - vocab_size (int): The size of the vocabulary.
            - pad_token_id (int): The ID of the padding token.
            - max_position_embeddings (int): The maximum number of positional embeddings.
            - layer_norm_eps (float): The epsilon value for layer normalization.
            - hidden_dropout_prob (float): The dropout probability for the hidden state.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.hidden_size = config.hidden_size
    self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
    self.position_embeddings = nn.Embedding(
        config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
    )
    self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
    self.padding_idx = config.pad_token_id

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEmbeddings.forward(input_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0)

Constructs the embeddings for MSErnieM model.

PARAMETER DESCRIPTION
self

The MSErnieMEmbeddings instance.

TYPE: MSErnieMEmbeddings

input_ids

The input tensor containing the indices of input tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The input tensor containing the indices of position tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input tensor containing the embeddings of input tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values_length

The length of past key values. Default is 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The forwarded embeddings tensor.

RAISES DESCRIPTION
ValueError

If the input_ids and inputs_embeds are both None.

ValueError

If the input_shape is invalid for position_ids calculation.

ValueError

If past_key_values_length is negative.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values_length: int = 0,
) -> mindspore.Tensor:
    """
    Constructs the embeddings for MSErnieM model.

    Args:
        self (MSErnieMEmbeddings): The MSErnieMEmbeddings instance.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor containing the indices of input tokens. Default is None.
        position_ids (Optional[mindspore.Tensor]):
            The input tensor containing the indices of position tokens. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]):
            The input tensor containing the embeddings of input tokens. Default is None.
        past_key_values_length (int): The length of past key values. Default is 0.

    Returns:
        mindspore.Tensor: The forwarded embeddings tensor.

    Raises:
        ValueError: If the input_ids and inputs_embeds are both None.
        ValueError: If the input_shape is invalid for position_ids calculation.
        ValueError: If past_key_values_length is negative.
    """
    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    if position_ids is None:
        input_shape = inputs_embeds.shape[:-1]
        ones = ops.ones(input_shape, dtype=mindspore.int64)
        seq_length = ops.cumsum(ones, dim=1)
        position_ids = seq_length - ones

        if past_key_values_length > 0:
            position_ids = position_ids + past_key_values_length
    # to mimic paddlenlp implementation
    position_ids += 2
    position_embeddings = self.position_embeddings(position_ids)
    embeddings = inputs_embeds + position_embeddings
    embeddings = self.layer_norm(embeddings)
    embeddings = self.dropout(embeddings)

    return embeddings

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoder

Bases: Module

This class represents an MSErnieMEncoder, which is a multi-layer transformer-based encoder model for natural language processing tasks.

The MSErnieMEncoder inherits from the nn.Module class and is designed to process input embeddings and generate hidden states, attentions, and last hidden state output.

ATTRIBUTE DESCRIPTION
config

The configuration object that contains the model's hyperparameters and settings.

TYPE: object

layers

A list of MSErnieMEncoderLayer instances that make up the layers of the encoder.

TYPE: ModuleList

METHOD DESCRIPTION
__init__

Initializes a new MSErnieMEncoder instance with the given configuration.

forward

Constructs the MSErnieMEncoder model by processing the input embeddings and generating the desired outputs.

Args:

  • input_embeds (mindspore.Tensor): The input embeddings for the model.
  • attention_mask (Optional[mindspore.Tensor], optional): The attention mask tensor to mask certain positions. Defaults to None.
  • head_mask (Optional[mindspore.Tensor], optional): The head mask tensor to mask certain heads. Defaults to None.
  • past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]], optional): The cached key-value tensors from previous decoding steps. Defaults to None.
  • output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.
  • output_hidden_states (Optional[bool], optional): Whether to output hidden states. Defaults to False.

Returns:

  • Tuple[mindspore.Tensor]: A tuple containing the last hidden state, hidden states, and attentions (if enabled).
Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
class MSErnieMEncoder(nn.Module):

    """
    This class represents an MSErnieMEncoder, which is a multi-layer transformer-based encoder model for
    natural language processing tasks.

    The MSErnieMEncoder inherits from the nn.Module class and is designed to process input embeddings and generate
    hidden states, attentions, and last hidden state output.

    Attributes:
        config (object): The configuration object that contains the model's hyperparameters and settings.
        layers (nn.ModuleList): A list of MSErnieMEncoderLayer instances that make up the layers of the encoder.

    Methods:
        __init__(self, config):
            Initializes a new MSErnieMEncoder instance with the given configuration.

        forward(self, input_embeds, attention_mask=None, head_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False):
            Constructs the MSErnieMEncoder model by processing the input embeddings and generating the desired outputs.

            Args:

            - input_embeds (mindspore.Tensor): The input embeddings for the model.
            - attention_mask (Optional[mindspore.Tensor], optional): The attention mask tensor to mask
            certain positions. Defaults to None.
            - head_mask (Optional[mindspore.Tensor], optional): The head mask tensor to mask certain heads.
            Defaults to None.
            - past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]], optional): The cached key-value tensors
            from previous decoding steps. Defaults to None.
            - output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.
            - output_hidden_states (Optional[bool], optional): Whether to output hidden states. Defaults to False.

            Returns:

            - Tuple[mindspore.Tensor]: A tuple containing the last hidden state, hidden states, and attentions (if enabled).

        """
    def __init__(self, config):
        """
        Initializes the MSErnieMEncoder class.

        Args:
            self: The object itself.
            config (object): An object containing the configuration parameters for the MSErnieMEncoder.
                The config object should have the following attributes:

                - num_hidden_layers (int): The number of hidden layers in the encoder.
                - other attributes specific to the MSErnieMEncoderLayer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.config = config
        self.layers = nn.ModuleList([MSErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

    def forward(
        self,
        input_embeds: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
        output_hidden_states: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        This method forwards the MSErnieMEncoder by processing the input embeddings and applying attention masks and
        head masks if provided.

        Args:
            self: The instance of the MSErnieMEncoder class.
            input_embeds (mindspore.Tensor): The input embeddings to be processed by the encoder.
            attention_mask (Optional[mindspore.Tensor]): An optional tensor representing the attention mask.
                If provided, it restricts the attention of the encoder.
            head_mask (Optional[mindspore.Tensor]): An optional tensor representing the head mask.
                If provided, it restricts the attention heads of the encoder.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): An optional tuple of past key values,
                if provided, it allows the encoder to reuse previously computed key value states.
            output_attentions (Optional[bool]): An optional boolean indicating whether to output attentions.
                Default is False.
            output_hidden_states (Optional[bool]): An optional boolean indicating whether to output hidden states.
                Default is False.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the processed output tensor.

        Raises:
            ValueError: If the input_embeds parameter is not of type mindspore.Tensor.
            ValueError: If the attention_mask parameter is not of type Optional[mindspore.Tensor].
            ValueError: If the head_mask parameter is not of type Optional[mindspore.Tensor].
            ValueError: If the past_key_values parameter is not of type Optional[Tuple[Tuple[mindspore.Tensor]]].
            ValueError: If the output_attentions parameter is not of type Optional[bool].
            ValueError: If the output_hidden_states parameter is not of type Optional[bool].
        """
        hidden_states = () if output_hidden_states else None
        attentions = () if output_attentions else None

        output = input_embeds
        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        for i, layer in enumerate(self.layers):
            layer_head_mask = head_mask[i] if head_mask is not None else None
            past_key_value = past_key_values[i] if past_key_values is not None else None

            output, opt_attn_weights = layer(
                hidden_states=output,
                attention_mask=attention_mask,
                head_mask=layer_head_mask,
                past_key_value=past_key_value,
            )

            if output_hidden_states:
                hidden_states = hidden_states + (output,)
            if output_attentions:
                attentions = attentions + (opt_attn_weights,)

        last_hidden_state = output
        return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoder.__init__(config)

Initializes the MSErnieMEncoder class.

PARAMETER DESCRIPTION
self

The object itself.

config

An object containing the configuration parameters for the MSErnieMEncoder. The config object should have the following attributes:

  • num_hidden_layers (int): The number of hidden layers in the encoder.
  • other attributes specific to the MSErnieMEncoderLayer.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
def __init__(self, config):
    """
    Initializes the MSErnieMEncoder class.

    Args:
        self: The object itself.
        config (object): An object containing the configuration parameters for the MSErnieMEncoder.
            The config object should have the following attributes:

            - num_hidden_layers (int): The number of hidden layers in the encoder.
            - other attributes specific to the MSErnieMEncoderLayer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.config = config
    self.layers = nn.ModuleList([MSErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoder.forward(input_embeds, attention_mask=None, head_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False)

This method forwards the MSErnieMEncoder by processing the input embeddings and applying attention masks and head masks if provided.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMEncoder class.

input_embeds

The input embeddings to be processed by the encoder.

TYPE: Tensor

attention_mask

An optional tensor representing the attention mask. If provided, it restricts the attention of the encoder.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

An optional tensor representing the head mask. If provided, it restricts the attention heads of the encoder.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

An optional tuple of past key values, if provided, it allows the encoder to reuse previously computed key value states.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

An optional boolean indicating whether to output attentions. Default is False.

TYPE: Optional[bool] DEFAULT: False

output_hidden_states

An optional boolean indicating whether to output hidden states. Default is False.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the processed output tensor.

RAISES DESCRIPTION
ValueError

If the input_embeds parameter is not of type mindspore.Tensor.

ValueError

If the attention_mask parameter is not of type Optional[mindspore.Tensor].

ValueError

If the head_mask parameter is not of type Optional[mindspore.Tensor].

ValueError

If the past_key_values parameter is not of type Optional[Tuple[Tuple[mindspore.Tensor]]].

ValueError

If the output_attentions parameter is not of type Optional[bool].

ValueError

If the output_hidden_states parameter is not of type Optional[bool].

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
def forward(
    self,
    input_embeds: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
    output_hidden_states: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    This method forwards the MSErnieMEncoder by processing the input embeddings and applying attention masks and
    head masks if provided.

    Args:
        self: The instance of the MSErnieMEncoder class.
        input_embeds (mindspore.Tensor): The input embeddings to be processed by the encoder.
        attention_mask (Optional[mindspore.Tensor]): An optional tensor representing the attention mask.
            If provided, it restricts the attention of the encoder.
        head_mask (Optional[mindspore.Tensor]): An optional tensor representing the head mask.
            If provided, it restricts the attention heads of the encoder.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): An optional tuple of past key values,
            if provided, it allows the encoder to reuse previously computed key value states.
        output_attentions (Optional[bool]): An optional boolean indicating whether to output attentions.
            Default is False.
        output_hidden_states (Optional[bool]): An optional boolean indicating whether to output hidden states.
            Default is False.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the processed output tensor.

    Raises:
        ValueError: If the input_embeds parameter is not of type mindspore.Tensor.
        ValueError: If the attention_mask parameter is not of type Optional[mindspore.Tensor].
        ValueError: If the head_mask parameter is not of type Optional[mindspore.Tensor].
        ValueError: If the past_key_values parameter is not of type Optional[Tuple[Tuple[mindspore.Tensor]]].
        ValueError: If the output_attentions parameter is not of type Optional[bool].
        ValueError: If the output_hidden_states parameter is not of type Optional[bool].
    """
    hidden_states = () if output_hidden_states else None
    attentions = () if output_attentions else None

    output = input_embeds
    if output_hidden_states:
        hidden_states = hidden_states + (output,)
    for i, layer in enumerate(self.layers):
        layer_head_mask = head_mask[i] if head_mask is not None else None
        past_key_value = past_key_values[i] if past_key_values is not None else None

        output, opt_attn_weights = layer(
            hidden_states=output,
            attention_mask=attention_mask,
            head_mask=layer_head_mask,
            past_key_value=past_key_value,
        )

        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        if output_attentions:
            attentions = attentions + (opt_attn_weights,)

    last_hidden_state = output
    return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoderLayer

Bases: Module

This class represents an encoder layer for the MSErnieM model. It includes self-attention, linear transformations, dropout, layer normalization, and activation functions for processing input hidden states.

The MSErnieMEncoderLayer class inherits from nn.Module and consists of an init method for initializing the layer's components and a forward method for performing the encoding operations on input tensors.

ATTRIBUTE DESCRIPTION
self_attn

Self-attention mechanism for capturing dependencies within the input hidden states.

TYPE: MSErnieMAttention

linear1

Linear transformation layer from hidden size to intermediate size.

TYPE: Linear

dropout

Dropout layer for regularization during activation functions.

TYPE: Dropout

linear2

Linear transformation layer from intermediate size back to hidden size.

TYPE: Linear

norm1

Layer normalization for normalizing hidden states.

TYPE: LayerNorm

norm2

Layer normalization for normalizing hidden states.

TYPE: LayerNorm

dropout1

Dropout layer for regularization after the first linear transformation.

TYPE: Dropout

dropout2

Dropout layer for regularization after the second linear transformation.

TYPE: Dropout

activation

Activation function applied to the hidden states.

TYPE: function

METHOD DESCRIPTION
__init__

Constructor method for initializing the encoder layer with provided configuration settings.

forward

Method for processing input hidden states through the encoder layer's components.

The forward method performs a series of operations on the input hidden states, including self-attention, linear transformations, activation functions, dropout, and layer normalization. It returns the processed hidden states and optional attention outputs if specified.

Note

The MSErnieMEncoderLayer class is designed to be used within the MSErnieM model architecture for encoding input sequences.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
class MSErnieMEncoderLayer(nn.Module):

    """
    This class represents an encoder layer for the MSErnieM model. It includes self-attention, linear transformations,
    dropout, layer normalization, and activation functions for processing input hidden states.

    The MSErnieMEncoderLayer class inherits from nn.Module and consists of an __init__ method for initializing the
    layer's components and a forward method for performing the encoding operations on input tensors.

    Attributes:
        self_attn (MSErnieMAttention): Self-attention mechanism for capturing dependencies within the input hidden states.
        linear1 (nn.Linear): Linear transformation layer from hidden size to intermediate size.
        dropout (nn.Dropout): Dropout layer for regularization during activation functions.
        linear2 (nn.Linear): Linear transformation layer from intermediate size back to hidden size.
        norm1 (nn.LayerNorm): Layer normalization for normalizing hidden states.
        norm2 (nn.LayerNorm): Layer normalization for normalizing hidden states.
        dropout1 (nn.Dropout): Dropout layer for regularization after the first linear transformation.
        dropout2 (nn.Dropout): Dropout layer for regularization after the second linear transformation.
        activation (function): Activation function applied to the hidden states.

    Methods:
        __init__: Constructor method for initializing the encoder layer with provided configuration settings.
        forward: Method for processing input hidden states through the encoder layer's components.

    The forward method performs a series of operations on the input hidden states, including self-attention,
    linear transformations, activation functions, dropout, and layer normalization. It returns the processed hidden
    states and optional attention outputs if specified.

    Note:
        The MSErnieMEncoderLayer class is designed to be used within the MSErnieM model architecture for encoding input sequences.
    """
    def __init__(self, config):
        """
        Initializes a MSErnieMEncoderLayer object with the provided configuration.

        Args:
            self (object): The MSErnieMEncoderLayer instance itself.
            config (object): An object containing configuration parameters for the encoder layer.
                This object should have the following attributes:

                - hidden_dropout_prob (float, optional): The dropout probability for the hidden layers. Default is 0.1.
                - act_dropout (float, optional): The dropout probability for the activation layers.
                Default is the value of hidden_dropout_prob.
                - hidden_size (int): The size of the hidden layers.
                - intermediate_size (int): The size of the intermediate layers.
                - layer_norm_eps (float): The epsilon value for layer normalization.
                - hidden_act (str or function): The activation function to use.
                If str, it should be a key in the ACT2FN dictionary.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        # to mimic paddlenlp implementation
        dropout = 0.1 if config.hidden_dropout_prob is None else config.hidden_dropout_prob
        act_dropout = config.hidden_dropout_prob if config.act_dropout is None else config.act_dropout

        self.self_attn = MSErnieMAttention(config)
        self.linear1 = nn.Linear(config.hidden_size, config.intermediate_size)
        self.dropout = nn.Dropout(p=act_dropout)
        self.linear2 = nn.Linear(config.intermediate_size, config.hidden_size)
        self.norm1 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.norm2 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout1 = nn.Dropout(p=dropout)
        self.dropout2 = nn.Dropout(p=dropout)
        if isinstance(config.hidden_act, str):
            self.activation = ACT2FN[config.hidden_act]
        else:
            self.activation = config.hidden_act

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = True,
    ):
        """Constructs the MSErnieMEncoderLayer.

        This method applies the MSErnieMEncoderLayer to the input hidden states.

        Args:
            self (MSErnieMEncoderLayer): The instance of the MSErnieMEncoderLayer class.
            hidden_states (mindspore.Tensor): The input hidden states.
                It is a tensor of shape (batch_size, sequence_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor.
                It is an optional tensor of shape (batch_size, sequence_length).
            head_mask (Optional[mindspore.Tensor]): The head mask tensor.
                It is an optional tensor of shape (num_heads, sequence_length, sequence_length).
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key-value tensor.
                It is an optional tuple of tuple of tensors.
            output_attentions (Optional[bool]): Whether to return attentions as well. Defaults to True.

        Returns:
            mindspore.Tensor or Tuple[mindspore.Tensor]: The output hidden states.
                If `output_attentions` is True, returns a tuple containing the hidden states and attentions.
                Otherwise, only returns the hidden states.

        Raises:
            None
        """
        residual = hidden_states
        outputs = self.self_attn(
                hidden_states=hidden_states,
                attention_mask=attention_mask,
                head_mask=head_mask,
                past_key_value=past_key_value,
                output_attentions=output_attentions,
            )

        hidden_states = outputs[0]
        hidden_states = residual + self.dropout1(hidden_states)
        hidden_states = self.norm1(hidden_states)
        residual = hidden_states

        hidden_states = self.linear1(hidden_states)
        hidden_states = self.activation(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.linear2(hidden_states)
        hidden_states = residual + self.dropout2(hidden_states)
        hidden_states = self.norm2(hidden_states)

        if output_attentions:
            return (hidden_states,) + outputs[1:]
        return hidden_states

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoderLayer.__init__(config)

Initializes a MSErnieMEncoderLayer object with the provided configuration.

PARAMETER DESCRIPTION
self

The MSErnieMEncoderLayer instance itself.

TYPE: object

config

An object containing configuration parameters for the encoder layer. This object should have the following attributes:

  • hidden_dropout_prob (float, optional): The dropout probability for the hidden layers. Default is 0.1.
  • act_dropout (float, optional): The dropout probability for the activation layers. Default is the value of hidden_dropout_prob.
  • hidden_size (int): The size of the hidden layers.
  • intermediate_size (int): The size of the intermediate layers.
  • layer_norm_eps (float): The epsilon value for layer normalization.
  • hidden_act (str or function): The activation function to use. If str, it should be a key in the ACT2FN dictionary.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
def __init__(self, config):
    """
    Initializes a MSErnieMEncoderLayer object with the provided configuration.

    Args:
        self (object): The MSErnieMEncoderLayer instance itself.
        config (object): An object containing configuration parameters for the encoder layer.
            This object should have the following attributes:

            - hidden_dropout_prob (float, optional): The dropout probability for the hidden layers. Default is 0.1.
            - act_dropout (float, optional): The dropout probability for the activation layers.
            Default is the value of hidden_dropout_prob.
            - hidden_size (int): The size of the hidden layers.
            - intermediate_size (int): The size of the intermediate layers.
            - layer_norm_eps (float): The epsilon value for layer normalization.
            - hidden_act (str or function): The activation function to use.
            If str, it should be a key in the ACT2FN dictionary.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    # to mimic paddlenlp implementation
    dropout = 0.1 if config.hidden_dropout_prob is None else config.hidden_dropout_prob
    act_dropout = config.hidden_dropout_prob if config.act_dropout is None else config.act_dropout

    self.self_attn = MSErnieMAttention(config)
    self.linear1 = nn.Linear(config.hidden_size, config.intermediate_size)
    self.dropout = nn.Dropout(p=act_dropout)
    self.linear2 = nn.Linear(config.intermediate_size, config.hidden_size)
    self.norm1 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.norm2 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout1 = nn.Dropout(p=dropout)
    self.dropout2 = nn.Dropout(p=dropout)
    if isinstance(config.hidden_act, str):
        self.activation = ACT2FN[config.hidden_act]
    else:
        self.activation = config.hidden_act

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoderLayer.forward(hidden_states, attention_mask=None, head_mask=None, past_key_value=None, output_attentions=True)

Constructs the MSErnieMEncoderLayer.

This method applies the MSErnieMEncoderLayer to the input hidden states.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMEncoderLayer class.

TYPE: MSErnieMEncoderLayer

hidden_states

The input hidden states. It is a tensor of shape (batch_size, sequence_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask tensor. It is an optional tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor. It is an optional tensor of shape (num_heads, sequence_length, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

The past key-value tensor. It is an optional tuple of tuple of tensors.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to return attentions as well. Defaults to True.

TYPE: Optional[bool] DEFAULT: True

RETURNS DESCRIPTION

mindspore.Tensor or Tuple[mindspore.Tensor]: The output hidden states. If output_attentions is True, returns a tuple containing the hidden states and attentions. Otherwise, only returns the hidden states.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = True,
):
    """Constructs the MSErnieMEncoderLayer.

    This method applies the MSErnieMEncoderLayer to the input hidden states.

    Args:
        self (MSErnieMEncoderLayer): The instance of the MSErnieMEncoderLayer class.
        hidden_states (mindspore.Tensor): The input hidden states.
            It is a tensor of shape (batch_size, sequence_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor.
            It is an optional tensor of shape (batch_size, sequence_length).
        head_mask (Optional[mindspore.Tensor]): The head mask tensor.
            It is an optional tensor of shape (num_heads, sequence_length, sequence_length).
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key-value tensor.
            It is an optional tuple of tuple of tensors.
        output_attentions (Optional[bool]): Whether to return attentions as well. Defaults to True.

    Returns:
        mindspore.Tensor or Tuple[mindspore.Tensor]: The output hidden states.
            If `output_attentions` is True, returns a tuple containing the hidden states and attentions.
            Otherwise, only returns the hidden states.

    Raises:
        None
    """
    residual = hidden_states
    outputs = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            head_mask=head_mask,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
        )

    hidden_states = outputs[0]
    hidden_states = residual + self.dropout1(hidden_states)
    hidden_states = self.norm1(hidden_states)
    residual = hidden_states

    hidden_states = self.linear1(hidden_states)
    hidden_states = self.activation(hidden_states)
    hidden_states = self.dropout(hidden_states)
    hidden_states = self.linear2(hidden_states)
    hidden_states = residual + self.dropout2(hidden_states)
    hidden_states = self.norm2(hidden_states)

    if output_attentions:
        return (hidden_states,) + outputs[1:]
    return hidden_states

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForInformationExtraction

Bases: MSErnieMPreTrainedModel

The 'MSErnieMForInformationExtraction' class is a model for information extraction tasks using the MSERNIE-M (multi-lingual) model. It extends the 'MSErnieMPreTrainedModel' class.

This class initializes the MSERNIE-M model and includes methods for forwarding the model for information extraction tasks, such as computing start and end position losses and probabilities. It also provides functionality for calculating the total loss, start probability, and end probability.

The 'MSErnieMForInformationExtraction' class inherits the configuration parameters and methods from 'MSErnieMPreTrainedModel' and extends it to support information extraction tasks. The class is designed to handle input tensors for input_ids, attention_mask, position_ids, head_mask, and inputs_embeds, and provides output in the form of a tuple containing total loss, start probability, end probability, and additional model outputs.

The class is suitable for tasks such as named entity recognition, question answering, and other information extraction tasks where start and end positions within a sequence need to be identified and predicted.

This class is a part of the MindSpore library and is designed to provide a high-level interface for utilizing the MSERNIE-M model for information extraction tasks.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
class MSErnieMForInformationExtraction(MSErnieMPreTrainedModel):

    """
    The 'MSErnieMForInformationExtraction' class is a model for information extraction tasks using the MSERNIE-M
    (multi-lingual) model. It extends the 'MSErnieMPreTrainedModel' class.

    This class initializes the MSERNIE-M model and includes methods for forwarding the model for information
    extraction tasks, such as computing start and end position losses and probabilities. It also provides functionality
    for calculating the total loss, start probability, and end probability.

    The 'MSErnieMForInformationExtraction' class inherits the configuration parameters and methods from
    'MSErnieMPreTrainedModel' and extends it to support information extraction tasks. The class is designed to handle
    input tensors for input_ids, attention_mask, position_ids, head_mask, and inputs_embeds, and provides output in the
    form of a tuple containing total loss, start probability, end probability, and additional model outputs.

    The class is suitable for tasks such as named entity recognition, question answering, and other information
    extraction tasks where start and end positions within a sequence need to be identified and predicted.

    This class is a part of the MindSpore library and is designed to provide a high-level interface for utilizing
    the MSERNIE-M model for information extraction tasks.
    """
    def __init__(self, config):
        """
        Initializes an instance of the MSErnieMForInformationExtraction class.

        Args:
            self (MSErnieMForInformationExtraction): The instance of the MSErnieMForInformationExtraction class.
            config (object): The configuration object for the model.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of the expected type.
            ValueError: If the config parameter does not contain the required attributes.
        """
        super().__init__(config)
        self.ernie_m = MSErnieMModel(config)
        self.linear_start = nn.Linear(config.hidden_size, 1)
        self.linear_end = nn.Linear(config.hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
    ) -> Tuple[mindspore.Tensor]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
                not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
                taken into account for computing the loss.
        """
        result = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        sequence_output = result[0]

        start_logits = self.linear_start(sequence_output)
        start_logits = start_logits.squeeze(-1)
        start_prob = self.sigmoid(start_logits)
        end_logits = self.linear_end(sequence_output)
        end_logits = end_logits.squeeze(-1)
        end_prob = self.sigmoid(end_logits)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = ops.binary_cross_entropy(start_prob, start_positions)
            end_loss = ops.binary_cross_entropy(end_prob, end_positions)
            total_loss = (start_loss + end_loss) / 2

        return (total_loss, start_prob, end_prob) + result[1:]

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForInformationExtraction.__init__(config)

Initializes an instance of the MSErnieMForInformationExtraction class.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMForInformationExtraction class.

TYPE: MSErnieMForInformationExtraction

config

The configuration object for the model.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of the expected type.

ValueError

If the config parameter does not contain the required attributes.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
def __init__(self, config):
    """
    Initializes an instance of the MSErnieMForInformationExtraction class.

    Args:
        self (MSErnieMForInformationExtraction): The instance of the MSErnieMForInformationExtraction class.
        config (object): The configuration object for the model.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of the expected type.
        ValueError: If the config parameter does not contain the required attributes.
    """
    super().__init__(config)
    self.ernie_m = MSErnieMModel(config)
    self.linear_start = nn.Linear(config.hidden_size, 1)
    self.linear_end = nn.Linear(config.hidden_size, 1)
    self.sigmoid = nn.Sigmoid()
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForInformationExtraction.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) for computing the start_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

end_positions

Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
) -> Tuple[mindspore.Tensor]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
            not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
            taken into account for computing the loss.
    """
    result = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    sequence_output = result[0]

    start_logits = self.linear_start(sequence_output)
    start_logits = start_logits.squeeze(-1)
    start_prob = self.sigmoid(start_logits)
    end_logits = self.linear_end(sequence_output)
    end_logits = end_logits.squeeze(-1)
    end_prob = self.sigmoid(end_logits)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = ops.binary_cross_entropy(start_prob, start_positions)
        end_loss = ops.binary_cross_entropy(end_prob, end_positions)
        total_loss = (start_loss + end_loss) / 2

    return (total_loss, start_prob, end_prob) + result[1:]

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForMultipleChoice

Bases: MSErnieMPreTrainedModel

This class represents a Multiple Choice classification model based on the MSErnieM architecture. It inherits from the MSErnieMPreTrainedModel and is designed to facilitate multiple choice question answering tasks.

The class implements the initialization method to set up the model and a forward method to process input data and produce classification predictions. The forward method handles input tensors for input_ids, attention_mask, position_ids, head_mask, inputs_embeds, and labels, and provides options for output_attentions and output_hidden_states.

The forward method computes the multiple choice classification loss based on the input data and generates reshaped logits for each choice. It utilizes the MSErnieM model to process the input data and applies dropout and dense layers for classification. Additionally, it handles the cross-entropy loss calculation for training the model.

Overall, the MSErnieMForMultipleChoice class encapsulates the functionality for performing multiple choice classification using the MSErnieM architecture and provides flexibility for processing various input tensors and generating classification predictions.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
class MSErnieMForMultipleChoice(MSErnieMPreTrainedModel):

    """
    This class represents a Multiple Choice classification model based on the MSErnieM architecture.
    It inherits from the MSErnieMPreTrainedModel and is designed to facilitate multiple choice question answering tasks.

    The class implements the initialization method to set up the model and a forward method to process input data and
    produce classification predictions. The forward method handles input tensors for input_ids, attention_mask,
    position_ids, head_mask, inputs_embeds, and labels, and provides options for output_attentions and output_hidden_states.

    The forward method computes the multiple choice classification loss based on the input data and generates reshaped
    logits for each choice. It utilizes the MSErnieM model to process the input data and applies dropout and dense layers
    for classification. Additionally, it handles the cross-entropy loss calculation for training the model.

    Overall, the MSErnieMForMultipleChoice class encapsulates the functionality for performing multiple choice
    classification using the MSErnieM architecture and provides flexibility for processing various input tensors and
    generating classification predictions.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of MSErnieMForMultipleChoice.

        Args:
            self (object): The instance of the class.
            config (object): The configuration object containing various parameters for the model initialization.

        Returns:
            None.

        Raises:
            ValueError: If the provided configuration object is invalid or missing required parameters.
            TypeError: If the configuration parameters are of incorrect type.
        """
        super().__init__(config)

        self.ernie_m = MSErnieMModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, 1)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
    ) -> Tuple[mindspore.Tensor]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
                num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
                `input_ids` above)
        """
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(reshaped_logits, labels)

        output = (reshaped_logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForMultipleChoice.__init__(config)

Initializes an instance of MSErnieMForMultipleChoice.

PARAMETER DESCRIPTION
self

The instance of the class.

TYPE: object

config

The configuration object containing various parameters for the model initialization.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the provided configuration object is invalid or missing required parameters.

TypeError

If the configuration parameters are of incorrect type.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
def __init__(self, config):
    """
    Initializes an instance of MSErnieMForMultipleChoice.

    Args:
        self (object): The instance of the class.
        config (object): The configuration object containing various parameters for the model initialization.

    Returns:
        None.

    Raises:
        ValueError: If the provided configuration object is invalid or missing required parameters.
        TypeError: If the configuration parameters are of incorrect type.
    """
    super().__init__(config)

    self.ernie_m = MSErnieMModel(config)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, 1)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForMultipleChoice.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None)

PARAMETER DESCRIPTION
labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
) -> Tuple[mindspore.Tensor]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
            `input_ids` above)
    """
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(reshaped_logits, labels)

    output = (reshaped_logits,) + outputs[2:]
    return ((loss,) + output) if loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForQuestionAnswering

Bases: MSErnieMPreTrainedModel

MSErnieMForQuestionAnswering represents a model for question answering tasks using the MSErnieM architecture. This class inherits from MSErnieMPreTrainedModel and implements methods for initializing the model and forwarding outputs for question answering.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for token classification.

TYPE: int

ernie_m

The MSErnieMModel instance used for processing inputs.

TYPE: MSErnieMModel

qa_outputs

Dense layer for outputting logits for question answering.

TYPE: Linear

METHOD DESCRIPTION
__init__

Initializes the MSErnieMForQuestionAnswering instance with the provided configuration.

forward

Constructs the question answering outputs based on the input tensors and labels provided.

Note

The start_positions and end_positions parameters are used for computing the token classification loss by providing labels for the start and end positions of the labelled span in the input sequence. Position indices are clamped to the length of the sequence and positions outside of the sequence are not considered for loss computation.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
class MSErnieMForQuestionAnswering(MSErnieMPreTrainedModel):

    """
    MSErnieMForQuestionAnswering represents a model for question answering tasks using the MSErnieM architecture.
    This class inherits from MSErnieMPreTrainedModel and implements methods for initializing the model and forwarding
    outputs for question answering.

    Attributes:
        num_labels (int): The number of labels for token classification.
        ernie_m (MSErnieMModel): The MSErnieMModel instance used for processing inputs.
        qa_outputs (nn.Linear): Dense layer for outputting logits for question answering.

    Methods:
        __init__: Initializes the MSErnieMForQuestionAnswering instance with the provided configuration.
        forward:
            Constructs the question answering outputs based on the input tensors and labels provided.

        Note:
            The start_positions and end_positions parameters are used for computing the token classification loss by
            providing labels for the start and end positions of the labelled span in the input sequence.
            Position indices are clamped to the length of the sequence and positions outside of the sequence
            are not considered for loss computation.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the MSErnieMForQuestionAnswering class.

        Args:
            self: The instance of the class.
            config:
                An instance of the configuration class containing the model configuration.

                - Type: object
                - Purpose: To provide the configuration settings for the model initialization.
                - Restrictions: Must be a valid configuration object.

        Returns:
            None

        Raises:
            TypeError: If the provided config parameter is not of the expected type.
            ValueError: If the config parameter is missing essential attributes.
            RuntimeError: If an error occurs during initialization or post-initialization steps.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.ernie_m = MSErnieMModel(config, add_pooling_layer=False)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
    ) -> Tuple[mindspore.Tensor]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        sequence_output = outputs[0]

        logits = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
            end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
            total_loss = (start_loss + end_loss) / 2

        output = (start_logits, end_logits) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForQuestionAnswering.__init__(config)

Initializes an instance of the MSErnieMForQuestionAnswering class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of the configuration class containing the model configuration.

  • Type: object
  • Purpose: To provide the configuration settings for the model initialization.
  • Restrictions: Must be a valid configuration object.

RETURNS DESCRIPTION

None

RAISES DESCRIPTION
TypeError

If the provided config parameter is not of the expected type.

ValueError

If the config parameter is missing essential attributes.

RuntimeError

If an error occurs during initialization or post-initialization steps.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
def __init__(self, config):
    """
    Initializes an instance of the MSErnieMForQuestionAnswering class.

    Args:
        self: The instance of the class.
        config:
            An instance of the configuration class containing the model configuration.

            - Type: object
            - Purpose: To provide the configuration settings for the model initialization.
            - Restrictions: Must be a valid configuration object.

    Returns:
        None

    Raises:
        TypeError: If the provided config parameter is not of the expected type.
        ValueError: If the config parameter is missing essential attributes.
        RuntimeError: If an error occurs during initialization or post-initialization steps.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.ernie_m = MSErnieMModel(config, add_pooling_layer=False)
    self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForQuestionAnswering.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
) -> Tuple[mindspore.Tensor]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    sequence_output = outputs[0]

    logits = self.qa_outputs(sequence_output)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
        end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
        total_loss = (start_loss + end_loss) / 2

    output = (start_logits, end_logits) + outputs[2:]
    return ((total_loss,) + output) if total_loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForSequenceClassification

Bases: MSErnieMPreTrainedModel

This class represents a modified version of the MSErnieM model for sequence classification tasks. It inherits from the MSErnieMPreTrainedModel class.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for the sequence classification task.

TYPE: int

config

The configuration object for the model.

TYPE: MSErnieMConfig

ernie_m

The MSErnieM model.

TYPE: MSErnieMModel

dropout

The dropout layer for regularization.

TYPE: Dropout

classifier

The dense layer for classification.

TYPE: Linear

METHOD DESCRIPTION
__init__

Initializes the MSErnieMForSequenceClassification instance.

forward

Constructs the model and computes the loss and logits for the given input.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
class MSErnieMForSequenceClassification(MSErnieMPreTrainedModel):

    """
    This class represents a modified version of the MSErnieM model for sequence classification tasks.
    It inherits from the MSErnieMPreTrainedModel class.

    Attributes:
        num_labels (int): The number of labels for the sequence classification task.
        config (MSErnieMConfig): The configuration object for the model.
        ernie_m (MSErnieMModel): The MSErnieM model.
        dropout (nn.Dropout): The dropout layer for regularization.
        classifier (nn.Linear): The dense layer for classification.

    Methods:
        __init__: Initializes the MSErnieMForSequenceClassification instance.
        forward: Constructs the model and computes the loss and logits for the given input.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForSequenceClassification.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the 'MSErnieMForSequenceClassification' class.

        Args:
            self: The instance of the class.
            config:
                An object of type 'Config' containing the configuration parameters for the model.

                - Type: Config
                - Purpose: Specifies the configuration of the model.
                - Restrictions: None

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.ernie_m = MSErnieMModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        labels: Optional[mindspore.Tensor] = None,
    ) -> Tuple[mindspore.Tensor]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            past_key_values=past_key_values,
            output_hidden_states=output_hidden_states,
            output_attentions=output_attentions,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = F.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = F.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = F.binary_cross_entropy_with_logits(logits, labels)

        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForSequenceClassification.__init__(config)

Initializes an instance of the 'MSErnieMForSequenceClassification' class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object of type 'Config' containing the configuration parameters for the model.

  • Type: Config
  • Purpose: Specifies the configuration of the model.
  • Restrictions: None

RETURNS DESCRIPTION

None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
def __init__(self, config):
    """
    Initializes an instance of the 'MSErnieMForSequenceClassification' class.

    Args:
        self: The instance of the class.
        config:
            An object of type 'Config' containing the configuration parameters for the model.

            - Type: Config
            - Purpose: Specifies the configuration of the model.
            - Restrictions: None

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.config = config

    self.ernie_m = MSErnieMModel(config)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForSequenceClassification.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, past_key_values=None, use_cache=None, output_hidden_states=None, output_attentions=None, labels=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    labels: Optional[mindspore.Tensor] = None,
) -> Tuple[mindspore.Tensor]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        past_key_values=past_key_values,
        output_hidden_states=output_hidden_states,
        output_attentions=output_attentions,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = F.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = F.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = F.binary_cross_entropy_with_logits(logits, labels)

    output = (logits,) + outputs[2:]
    return ((loss,) + output) if loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForTokenClassification

Bases: MSErnieMPreTrainedModel

This class represents a token classification model based on MSErnieM architecture. It is designed for tasks that involve assigning labels to individual tokens within a sequence.

The MSErnieMForTokenClassification class inherits from MSErnieMPreTrainedModel and extends its functionality by adding a token classification layer on top of the base model.

The class's forwardor initializes the model and sets up the necessary components. It takes a config object as input and initializes the base model with the provided configuration. The number of labels for token classification is also stored for later use. The dropout layer and the token classification layer are defined. Lastly, the post_init method is called to perform any additional initialization steps.

The forward method is the main entry point for using the model for token classification. It takes various input tensors such as input_ids, attention_mask, position_ids, head_mask, inputs_embeds, past_key_values, output_hidden_states, output_attentions, and labels.

The method first passes the input tensors through the base model (self.ernie_m) to obtain the sequence output. The sequence output is then passed through a dropout layer to prevent overfitting. Finally, the token classification layer (self.classifier) is applied to generate logits for each token in the sequence.

If labels are provided, the method calculates the token classification loss using the cross-entropy function. The loss is computed by reshaping the logits and labels tensors to have a shape of (batch_size * sequence_length, num_labels) and applying the cross-entropy function.

The method returns a tuple containing the logits for each token, as well as any additional outputs from the base model. If a loss is calculated, it is included in the output tuple.

Note

The MSErnieMForTokenClassification class assumes that the input tensors are of type mindspore.Tensor, and the labels tensor should have a shape of (batch_size, sequence_length) with indices in the range [0, ..., config.num_labels - 1].

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
class MSErnieMForTokenClassification(MSErnieMPreTrainedModel):

    """
    This class represents a token classification model based on MSErnieM architecture.
    It is designed for tasks that involve assigning labels to individual tokens within a sequence.

    The `MSErnieMForTokenClassification` class inherits from `MSErnieMPreTrainedModel` and extends its functionality
    by adding a token classification layer on top of the base model.

    The class's forwardor initializes the model and sets up the necessary components.
    It takes a `config` object as input and initializes the base model with the provided configuration.
    The number of labels for token classification is also stored for later use.
    The dropout layer and the token classification layer are defined. Lastly, the `post_init` method is called to
    perform any additional initialization steps.

    The `forward` method is the main entry point for using the model for token classification.
    It takes various input tensors such as `input_ids`, `attention_mask`, `position_ids`, `head_mask`,
    `inputs_embeds`, `past_key_values`, `output_hidden_states`, `output_attentions`, and `labels`.

    The method first passes the input tensors through the base model (`self.ernie_m`) to obtain the sequence output.
    The sequence output is then passed through a dropout layer to prevent overfitting.
    Finally, the token classification layer (`self.classifier`) is applied to generate logits for each token in the
    sequence.

    If `labels` are provided, the method calculates the token classification loss using the cross-entropy function.
    The loss is computed by reshaping the logits and labels tensors to have a shape of
    `(batch_size * sequence_length, num_labels)` and applying the cross-entropy function.

    The method returns a tuple containing the logits for each token, as well as any additional outputs from the base model.
    If a loss is calculated, it is included in the output tuple.

    Note:
        The `MSErnieMForTokenClassification` class assumes that the input tensors are of type `mindspore.Tensor`,
        and the labels tensor should have a shape of `(batch_size, sequence_length)` with indices in the range
        `[0, ..., config.num_labels - 1]`.

    """
    # Copied from transformers.models.bert.modeling_bert.BertForTokenClassification.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes a new instance of the MSErnieMForTokenClassification class.

        Args:
            self: The instance of the class.
            config:
                An object containing configuration parameters for the model.

                - Type: dict
                - Purpose: Configuration settings for the model.
                - Restrictions: Must contain the key 'num_labels'.

        Returns:
            None.

        Raises:
            TypeError: If the 'config' parameter is not provided or is not of type dict.
            KeyError: If the 'num_labels' key is missing in the 'config' parameter.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.ernie_m = MSErnieMModel(config, add_pooling_layer=False)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        labels: Optional[mindspore.Tensor] = None,
    ) -> Tuple[mindspore.Tensor]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            past_key_values=past_key_values,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForTokenClassification.__init__(config)

Initializes a new instance of the MSErnieMForTokenClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration parameters for the model.

  • Type: dict
  • Purpose: Configuration settings for the model.
  • Restrictions: Must contain the key 'num_labels'.

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the 'config' parameter is not provided or is not of type dict.

KeyError

If the 'num_labels' key is missing in the 'config' parameter.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
def __init__(self, config):
    """
    Initializes a new instance of the MSErnieMForTokenClassification class.

    Args:
        self: The instance of the class.
        config:
            An object containing configuration parameters for the model.

            - Type: dict
            - Purpose: Configuration settings for the model.
            - Restrictions: Must contain the key 'num_labels'.

    Returns:
        None.

    Raises:
        TypeError: If the 'config' parameter is not provided or is not of type dict.
        KeyError: If the 'num_labels' key is missing in the 'config' parameter.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.ernie_m = MSErnieMModel(config, add_pooling_layer=False)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMForTokenClassification.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, past_key_values=None, output_hidden_states=None, output_attentions=None, labels=None)

PARAMETER DESCRIPTION
labels

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    labels: Optional[mindspore.Tensor] = None,
) -> Tuple[mindspore.Tensor]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        past_key_values=past_key_values,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    sequence_output = outputs[0]

    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

    output = (logits,) + outputs[2:]
    return ((loss,) + output) if loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMModel

Bases: MSErnieMPreTrainedModel

This class represents the MSErnieMModel, which is a variant of the MSErnieMPreTrainedModel. It is a model for sequence classification tasks, built on top of the MSErnieM language model.

The MSErnieMModel class includes methods for initializing the model, getting and setting input embeddings, pruning model heads, and forwarding the model.

METHOD DESCRIPTION
__init__

Initializes the MSErnieMModel with the given configuration. By default, it adds a pooling layer to the model.

get_input_embeddings

Returns the word embeddings used as input to the model.

set_input_embeddings

Sets the word embeddings used as input to the model.

_prune_heads

Prunes the specified heads in the model.

forward

Constructs the model with the given input and configuration.

Note

The MSErnieMModel class inherits from the MSErnieMPreTrainedModel, which provides additional functionality and methods.

Example
>>> config = MSErnieMConfig()
>>> model = MSErnieMModel(config)
>>> input_ids = ...
>>> position_ids = ...
>>> attention_mask = ...
>>> output = model.forward(input_ids, position_ids, attention_mask)
Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
class MSErnieMModel(MSErnieMPreTrainedModel):

    """
    This class represents the MSErnieMModel, which is a variant of the MSErnieMPreTrainedModel.
    It is a model for sequence classification tasks, built on top of the MSErnieM language model.

    The MSErnieMModel class includes methods for initializing the model, getting and setting input embeddings,
    pruning model heads, and forwarding the model.

    Methods:
        __init__: Initializes the MSErnieMModel with the given configuration.
            By default, it adds a pooling layer to the model.
        get_input_embeddings: Returns the word embeddings used as input to the model.
        set_input_embeddings: Sets the word embeddings used as input to the model.
        _prune_heads: Prunes the specified heads in the model.
        forward: Constructs the model with the given input and configuration.

    Note:
        The MSErnieMModel class inherits from the MSErnieMPreTrainedModel, which provides additional functionality
        and methods.

    Example:
        ```python
        >>> config = MSErnieMConfig()
        >>> model = MSErnieMModel(config)
        >>> input_ids = ...
        >>> position_ids = ...
        >>> attention_mask = ...
        >>> output = model.forward(input_ids, position_ids, attention_mask)
        ```
    """
    def __init__(self, config, add_pooling_layer=True):
        """
        Initializes a new MSErnieMModel instance.

        Args:
            self: The instance of the MSErnieMModel class.
            config:
                An object containing configuration settings for the model.

                - Type: object
                - Purpose: Specifies the configuration settings for the model.
            add_pooling_layer:
                A boolean flag indicating whether to add a pooling layer.

                - Type: bool
                - Purpose: Specifies whether to include a pooling layer in the model.
                - Restrictions: Must be a boolean value.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.initializer_range = config.initializer_range
        self.embeddings = MSErnieMEmbeddings(config)
        self.encoder = MSErnieMEncoder(config)
        self.pooler = MSErnieMPooler(config) if add_pooling_layer else None
        self.post_init()

    def get_input_embeddings(self):
        """
        Method: get_input_embeddings

        Description:
            This method returns the input embeddings from the MSErnieMModel class.

        Args:
            self: MSErnieMModel
                The instance of the MSErnieMModel class.

                - Type: MSErnieMModel object
                - Purpose: To access the embeddings from the model.
                - Restrictions: None

        Returns:
            None.

        Raises:
            None.
        """
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        """
        Sets the input embeddings for the MSErnieMModel.

        Args:
            self (MSErnieMModel): The instance of the MSErnieMModel.
            value (object): The input embeddings to be set. It can be of any type.

        Returns:
            None.

        Raises:
            None.
        """
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
        class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.encoder.layers[layer].self_attn.prune_heads(heads)

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
    ) -> Tuple[mindspore.Tensor]:
        '''
        Constructs the MSErnieMModel.

        Args:
            self: The object itself.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor containing the indices of input sequence tokens in the vocabulary.
            position_ids (Optional[mindspore.Tensor]):
                The input tensor containing the position indices of each input sequence token in the sequence.
            attention_mask (Optional[mindspore.Tensor]):
                The input tensor containing the attention mask to avoid performing attention on padding tokens.
            head_mask (Optional[mindspore.Tensor]):
                The input tensor containing the mask to nullify selected heads of the self-attention modules.
            inputs_embeds (Optional[mindspore.Tensor]):
                The input tensor containing the embedded representation of the input sequence.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]):
                The input tensor containing the cached key and value tensors of the self-attention mechanism.
            use_cache (Optional[bool]): Whether to use the cache for the decoding steps of the model.
            output_hidden_states (Optional[bool]): Whether to return the hidden states of all layers.
            output_attentions (Optional[bool]): Whether to return the attention weights.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the output sequence tensor, the pooled output tensor,
                and other encoded outputs.

        Raises:
            ValueError: If both input_ids and inputs_embeds are provided.

        '''
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time.")

        # init the default bool value
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )

        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        past_key_values_length = 0
        if past_key_values is not None:
            past_key_values_length = past_key_values[0][0].shape[2]

        # Adapted from paddlenlp.transformers.ernie_m.ErnieMModel
        if attention_mask is None:
            attention_mask = (input_ids == 0).to(self.dtype)
            attention_mask = attention_mask * float(ops.finfo(attention_mask.dtype).min)
            if past_key_values is not None:
                batch_size = past_key_values[0][0].shape[0]
                past_mask = ops.zeros([batch_size, 1, 1, past_key_values_length], dtype=attention_mask.dtype)
                attention_mask = ops.concat([past_mask, attention_mask], dim=-1)
        # For 2D attention_mask from tokenizer
        elif attention_mask.ndim == 2:
            attention_mask = attention_mask.to(self.dtype)
            attention_mask = 1.0 - attention_mask
            attention_mask = attention_mask * float(ops.finfo(attention_mask.dtype).min)

        extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(1)

        embedding_output = self.embeddings(
            input_ids=input_ids,
            position_ids=position_ids,
            inputs_embeds=inputs_embeds,
            past_key_values_length=past_key_values_length,
        )
        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            past_key_values=past_key_values,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        sequence_output = encoder_outputs[0]
        pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
        return (sequence_output, pooler_output) + encoder_outputs[1:]

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMModel.__init__(config, add_pooling_layer=True)

Initializes a new MSErnieMModel instance.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMModel class.

config

An object containing configuration settings for the model.

  • Type: object
  • Purpose: Specifies the configuration settings for the model.

add_pooling_layer

A boolean flag indicating whether to add a pooling layer.

  • Type: bool
  • Purpose: Specifies whether to include a pooling layer in the model.
  • Restrictions: Must be a boolean value.

DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
def __init__(self, config, add_pooling_layer=True):
    """
    Initializes a new MSErnieMModel instance.

    Args:
        self: The instance of the MSErnieMModel class.
        config:
            An object containing configuration settings for the model.

            - Type: object
            - Purpose: Specifies the configuration settings for the model.
        add_pooling_layer:
            A boolean flag indicating whether to add a pooling layer.

            - Type: bool
            - Purpose: Specifies whether to include a pooling layer in the model.
            - Restrictions: Must be a boolean value.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.initializer_range = config.initializer_range
    self.embeddings = MSErnieMEmbeddings(config)
    self.encoder = MSErnieMEncoder(config)
    self.pooler = MSErnieMPooler(config) if add_pooling_layer else None
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMModel.forward(input_ids=None, position_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, past_key_values=None, use_cache=None, output_hidden_states=None, output_attentions=None)

Constructs the MSErnieMModel.

PARAMETER DESCRIPTION
self

The object itself.

input_ids

The input tensor containing the indices of input sequence tokens in the vocabulary.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The input tensor containing the position indices of each input sequence token in the sequence.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The input tensor containing the attention mask to avoid performing attention on padding tokens.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The input tensor containing the mask to nullify selected heads of the self-attention modules.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input tensor containing the embedded representation of the input sequence.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The input tensor containing the cached key and value tensors of the self-attention mechanism.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

use_cache

Whether to use the cache for the decoding steps of the model.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to return the hidden states of all layers.

TYPE: Optional[bool] DEFAULT: None

output_attentions

Whether to return the attention weights.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the output sequence tensor, the pooled output tensor, and other encoded outputs.

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are provided.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
) -> Tuple[mindspore.Tensor]:
    '''
    Constructs the MSErnieMModel.

    Args:
        self: The object itself.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor containing the indices of input sequence tokens in the vocabulary.
        position_ids (Optional[mindspore.Tensor]):
            The input tensor containing the position indices of each input sequence token in the sequence.
        attention_mask (Optional[mindspore.Tensor]):
            The input tensor containing the attention mask to avoid performing attention on padding tokens.
        head_mask (Optional[mindspore.Tensor]):
            The input tensor containing the mask to nullify selected heads of the self-attention modules.
        inputs_embeds (Optional[mindspore.Tensor]):
            The input tensor containing the embedded representation of the input sequence.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]):
            The input tensor containing the cached key and value tensors of the self-attention mechanism.
        use_cache (Optional[bool]): Whether to use the cache for the decoding steps of the model.
        output_hidden_states (Optional[bool]): Whether to return the hidden states of all layers.
        output_attentions (Optional[bool]): Whether to return the attention weights.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the output sequence tensor, the pooled output tensor,
            and other encoded outputs.

    Raises:
        ValueError: If both input_ids and inputs_embeds are provided.

    '''
    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time.")

    # init the default bool value
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )

    head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

    past_key_values_length = 0
    if past_key_values is not None:
        past_key_values_length = past_key_values[0][0].shape[2]

    # Adapted from paddlenlp.transformers.ernie_m.ErnieMModel
    if attention_mask is None:
        attention_mask = (input_ids == 0).to(self.dtype)
        attention_mask = attention_mask * float(ops.finfo(attention_mask.dtype).min)
        if past_key_values is not None:
            batch_size = past_key_values[0][0].shape[0]
            past_mask = ops.zeros([batch_size, 1, 1, past_key_values_length], dtype=attention_mask.dtype)
            attention_mask = ops.concat([past_mask, attention_mask], dim=-1)
    # For 2D attention_mask from tokenizer
    elif attention_mask.ndim == 2:
        attention_mask = attention_mask.to(self.dtype)
        attention_mask = 1.0 - attention_mask
        attention_mask = attention_mask * float(ops.finfo(attention_mask.dtype).min)

    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(1)

    embedding_output = self.embeddings(
        input_ids=input_ids,
        position_ids=position_ids,
        inputs_embeds=inputs_embeds,
        past_key_values_length=past_key_values_length,
    )
    encoder_outputs = self.encoder(
        embedding_output,
        attention_mask=extended_attention_mask,
        head_mask=head_mask,
        past_key_values=past_key_values,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    sequence_output = encoder_outputs[0]
    pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
    return (sequence_output, pooler_output) + encoder_outputs[1:]

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMModel.get_input_embeddings()

Description

This method returns the input embeddings from the MSErnieMModel class.

PARAMETER DESCRIPTION
self

MSErnieMModel The instance of the MSErnieMModel class.

  • Type: MSErnieMModel object
  • Purpose: To access the embeddings from the model.
  • Restrictions: None

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
def get_input_embeddings(self):
    """
    Method: get_input_embeddings

    Description:
        This method returns the input embeddings from the MSErnieMModel class.

    Args:
        self: MSErnieMModel
            The instance of the MSErnieMModel class.

            - Type: MSErnieMModel object
            - Purpose: To access the embeddings from the model.
            - Restrictions: None

    Returns:
        None.

    Raises:
        None.
    """
    return self.embeddings.word_embeddings

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMModel.set_input_embeddings(value)

Sets the input embeddings for the MSErnieMModel.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMModel.

TYPE: MSErnieMModel

value

The input embeddings to be set. It can be of any type.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
def set_input_embeddings(self, value):
    """
    Sets the input embeddings for the MSErnieMModel.

    Args:
        self (MSErnieMModel): The instance of the MSErnieMModel.
        value (object): The input embeddings to be set. It can be of any type.

    Returns:
        None.

    Raises:
        None.
    """
    self.embeddings.word_embeddings = value

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMPooler

Bases: Module

A class representing a pooling layer for the MSErnieM model in MindSpore.

This class is responsible for forwarding the pooling layer of the MSErnieM model. The pooling layer takes the hidden states of the model as input and applies a dense layer followed by an activation function to the first token tensor. The resulting pooled output is returned.

ATTRIBUTE DESCRIPTION
dense

A dense layer used in the pooling layer.

TYPE: Linear

activation

An activation function used in the pooling layer.

TYPE: Tanh

METHOD DESCRIPTION
__init__

Initializes the MSErnieMPooler instance.

forward

Constructs the pooling layer.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
class MSErnieMPooler(nn.Module):

    """A class representing a pooling layer for the MSErnieM model in MindSpore.

    This class is responsible for forwarding the pooling layer of the MSErnieM model.
    The pooling layer takes the hidden states of the model as input and applies a dense layer followed by
    an activation function to the first token tensor. The resulting pooled output is returned.

    Attributes:
        dense (nn.Linear): A dense layer used in the pooling layer.
        activation (nn.Tanh): An activation function used in the pooling layer.

    Methods:
        __init__: Initializes the MSErnieMPooler instance.
        forward: Constructs the pooling layer.

    """
    def __init__(self, config):
        """
        Initializes a new instance of the MSErnieMPooler class.

        Args:
            self: The object itself.
            config:
                An instance of the configuration class for MSErnieMPooler.

                - Type: Any valid configuration class.
                - Purpose: Specifies the configuration settings for the MSErnieMPooler instance.
                - Restrictions: None.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the pooled output tensor from the provided hidden states.

        Args:
            self (MSErnieMPooler): The instance of the MSErnieMPooler class.
            hidden_states (mindspore.Tensor): The input tensor representing the hidden states of the input sequence.
                It should be of shape (batch_size, sequence_length, hidden_size).

        Returns:
            mindspore.Tensor: The pooled output tensor obtained from the hidden states.
                It is a 2D tensor of shape (batch_size, hidden_size) representing the pooled output features.

        Raises:
            ValueError: If the shape of the input hidden_states tensor is not as expected.
            TypeError: If the input hidden_states is not a mindspore.Tensor object.
        """
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMPooler.__init__(config)

Initializes a new instance of the MSErnieMPooler class.

PARAMETER DESCRIPTION
self

The object itself.

config

An instance of the configuration class for MSErnieMPooler.

  • Type: Any valid configuration class.
  • Purpose: Specifies the configuration settings for the MSErnieMPooler instance.
  • Restrictions: None.

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
def __init__(self, config):
    """
    Initializes a new instance of the MSErnieMPooler class.

    Args:
        self: The object itself.
        config:
            An instance of the configuration class for MSErnieMPooler.

            - Type: Any valid configuration class.
            - Purpose: Specifies the configuration settings for the MSErnieMPooler instance.
            - Restrictions: None.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.dense = nn.Linear(config.hidden_size, config.hidden_size)
    self.activation = nn.Tanh()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMPooler.forward(hidden_states)

Constructs the pooled output tensor from the provided hidden states.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMPooler class.

TYPE: MSErnieMPooler

hidden_states

The input tensor representing the hidden states of the input sequence. It should be of shape (batch_size, sequence_length, hidden_size).

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The pooled output tensor obtained from the hidden states. It is a 2D tensor of shape (batch_size, hidden_size) representing the pooled output features.

RAISES DESCRIPTION
ValueError

If the shape of the input hidden_states tensor is not as expected.

TypeError

If the input hidden_states is not a mindspore.Tensor object.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the pooled output tensor from the provided hidden states.

    Args:
        self (MSErnieMPooler): The instance of the MSErnieMPooler class.
        hidden_states (mindspore.Tensor): The input tensor representing the hidden states of the input sequence.
            It should be of shape (batch_size, sequence_length, hidden_size).

    Returns:
        mindspore.Tensor: The pooled output tensor obtained from the hidden states.
            It is a 2D tensor of shape (batch_size, hidden_size) representing the pooled output features.

    Raises:
        ValueError: If the shape of the input hidden_states tensor is not as expected.
        TypeError: If the input hidden_states is not a mindspore.Tensor object.
    """
    # We "pool" the model by simply taking the hidden state corresponding
    # to the first token.
    first_token_tensor = hidden_states[:, 0]
    pooled_output = self.dense(first_token_tensor)
    pooled_output = self.activation(pooled_output)
    return pooled_output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
class MSErnieMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = ErnieMConfig
    base_model_prefix = "ernie_m"

    def _init_weights(self, cell):
        """Initialize the weights"""
        if isinstance(cell, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            cell.weight.set_data(initializer(Normal(self.config.initializer_range),
                                                    cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(0.0, self.config.initializer_range, cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMSelfAttention

Bases: Module

The MSErnieMSelfAttention class represents a self-attention mechanism for the MS ERNIE model. This class inherits from nn.Module.

This class implements the self-attention mechanism, which is a crucial component in natural language processing tasks like machine translation and text summarization. The self-attention mechanism allows the model to weigh the significance of different words in a sequence when processing each word, enabling the model to capture long-range dependencies and improve performance on various language understanding tasks.

The class includes methods for initializing the self-attention mechanism, transposing input tensors for calculating attention scores, and forwarding the self-attention mechanism using the provided input tensors. Additionally, it supports position embeddings and optional output of attention probabilities.

The MSErnieMSelfAttention class ensures that the self-attention mechanism is efficiently implemented and seamlessly integrated into the MS ERNIE model, contributing to the model's effectiveness in natural language understanding and generation tasks.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
class MSErnieMSelfAttention(nn.Module):

    """
    The `MSErnieMSelfAttention` class represents a self-attention mechanism for the MS ERNIE model.
    This class inherits from `nn.Module`.

    This class implements the self-attention mechanism, which is a crucial component in natural language processing
    tasks like machine translation and text summarization. The self-attention mechanism allows the model to weigh the
    significance of different words in a sequence when processing each word, enabling the model to capture long-range
    dependencies and improve performance on various language understanding tasks.

    The class includes methods for initializing the self-attention mechanism, transposing input tensors for calculating
    attention scores, and forwarding the self-attention mechanism using the provided input
    tensors. Additionally, it supports position embeddings and optional output of attention probabilities.

    The `MSErnieMSelfAttention` class ensures that the self-attention mechanism is efficiently implemented and
    seamlessly integrated into the MS ERNIE model, contributing to the model's effectiveness in natural language
    understanding and generation tasks.
    """
    def __init__(self, config, position_embedding_type=None):
        """
        Initializes the MSErnieMSelfAttention instance.

        Args:
            self (MSErnieMSelfAttention): The MSErnieMSelfAttention instance.
            config (object): An object containing configuration settings for the self-attention mechanism.
            position_embedding_type (str, optional): The type of position embedding to be used, defaults to None.
                Possible values are 'absolute', 'relative_key', or 'relative_key_query'.

        Returns:
            None.

        Raises:
            ValueError: If the hidden size in the configuration is not a multiple of the number of attention heads
                and the configuration does not have an 'embedding_size' attribute.
        """
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
            raise ValueError(
                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
                f"heads ({config.num_attention_heads})"
            )

        self.num_attention_heads = config.num_attention_heads
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.q_proj = nn.Linear(config.hidden_size, self.all_head_size)
        self.k_proj = nn.Linear(config.hidden_size, self.all_head_size)
        self.v_proj = nn.Linear(config.hidden_size, self.all_head_size)

        self.dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
        self.position_embedding_type = position_embedding_type or getattr(
            config, "position_embedding_type", "absolute"
        )
        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            self.max_position_embeddings = config.max_position_embeddings
            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

        self.is_decoder = config.is_decoder

    def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
        """
        Method transposes the input tensor for scores in a self-attention mechanism.

        Args:
            self (MSErnieMSelfAttention): An instance of the MSErnieMSelfAttention class.
            x (mindspore.Tensor): The input tensor to be transposed. It represents the scores to be processed.
                It is expected to have a shape compatible with the transposition operation.

        Returns:
            mindspore.Tensor: A new tensor obtained by transposing the input tensor for scores.
                The shape of the returned tensor is transformed based on the number of attention heads and head size.

        Raises:
            None
        """
        new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        Method to forward self-attention mechanism in the MSErnieMSelfAttention class.

        Args:
            self: The instance of the class.
            hidden_states (mindspore.Tensor): The input hidden states to the self-attention mechanism.
            attention_mask (Optional[mindspore.Tensor], optional):
                Mask tensor indicating which positions should be attended to and which should not. Defaults to None.
            head_mask (Optional[mindspore.Tensor], optional):
                Mask tensor indicating which heads to mask. Defaults to None.
            encoder_hidden_states (Optional[mindspore.Tensor], optional):
                Hidden states from an encoder in case of cross-attention. Defaults to None.
            encoder_attention_mask (Optional[mindspore.Tensor], optional): Mask tensor for encoder_hidden_states.
                Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]], optional):
                Tuple containing the past key and value tensors. Defaults to None.
            output_attentions (Optional[bool], optional): Flag to output attentions. Defaults to False.

        Returns:
            Tuple[mindspore.Tensor]:
                A tuple containing the context layer and attention probabilities if output_attentions is True,
                otherwise just the context layer.

        Raises:
            ValueError: If the position_embedding_type is not 'relative_key' or 'relative_key_query'.
            TypeError: If there are issues with the input types or dimensions during the computations.
            RuntimeError: If there are runtime issues during the self-attention mechanism.
        """
        mixed_query_layer = self.q_proj(hidden_states)

        # If this is instantiated as a cross-attention module, the keys
        # and values come from an encoder; the attention mask needs to be
        # such that the encoder's padding tokens are not attended to.
        is_cross_attention = encoder_hidden_states is not None

        if is_cross_attention and past_key_value is not None:
            # reuse k,v, cross_attentions
            key_layer = past_key_value[0]
            value_layer = past_key_value[1]
            attention_mask = encoder_attention_mask
        elif is_cross_attention:
            key_layer = self.transpose_for_scores(self.k_proj(encoder_hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(encoder_hidden_states))
            attention_mask = encoder_attention_mask
        elif past_key_value is not None:
            key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(hidden_states))
            key_layer = ops.cat([past_key_value[0], key_layer], dim=2)
            value_layer = ops.cat([past_key_value[1], value_layer], dim=2)
        else:
            key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(hidden_states))

        query_layer = self.transpose_for_scores(mixed_query_layer)

        use_cache = past_key_value is not None
        if self.is_decoder:
            # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
            # Further calls to cross_attention layer can then reuse all cross-attention
            # key/value_states (first "if" case)
            # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
            # all previous decoder key/value_states. Further calls to uni-directional self-attention
            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
            # if encoder bi-directional self-attention `past_key_value` is always `None`
            past_key_value = (key_layer, value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))

        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            query_length, key_length = query_layer.shape[2], key_layer.shape[2]
            if use_cache:
                position_ids_l = mindspore.tensor(key_length - 1, dtype=mindspore.int64).view(
                    -1, 1
                )
            else:
                position_ids_l = ops.arange(query_length, dtype=mindspore.int64).view(-1, 1)
            position_ids_r = ops.arange(key_length, dtype=mindspore.int64).view(1, -1)
            distance = position_ids_l - position_ids_r

            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

            if self.position_embedding_type == "relative_key":
                relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores
            elif self.position_embedding_type == "relative_key_query":
                relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

        attention_scores = attention_scores / ops.sqrt(ops.scalar_to_tensor(self.attention_head_size, attention_scores.dtype))
        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in ErnieMModel forward() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = ops.softmax(attention_scores, dim=-1)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = ops.matmul(attention_probs, value_layer)

        context_layer = context_layer.permute(0, 2, 1, 3)
        new_context_layer_shape = context_layer.shape[:-2] + (self.all_head_size,)
        context_layer = context_layer.view(new_context_layer_shape)

        outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

        if self.is_decoder:
            outputs = outputs + (past_key_value,)
        return outputs

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMSelfAttention.__init__(config, position_embedding_type=None)

Initializes the MSErnieMSelfAttention instance.

PARAMETER DESCRIPTION
self

The MSErnieMSelfAttention instance.

TYPE: MSErnieMSelfAttention

config

An object containing configuration settings for the self-attention mechanism.

TYPE: object

position_embedding_type

The type of position embedding to be used, defaults to None. Possible values are 'absolute', 'relative_key', or 'relative_key_query'.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the hidden size in the configuration is not a multiple of the number of attention heads and the configuration does not have an 'embedding_size' attribute.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def __init__(self, config, position_embedding_type=None):
    """
    Initializes the MSErnieMSelfAttention instance.

    Args:
        self (MSErnieMSelfAttention): The MSErnieMSelfAttention instance.
        config (object): An object containing configuration settings for the self-attention mechanism.
        position_embedding_type (str, optional): The type of position embedding to be used, defaults to None.
            Possible values are 'absolute', 'relative_key', or 'relative_key_query'.

    Returns:
        None.

    Raises:
        ValueError: If the hidden size in the configuration is not a multiple of the number of attention heads
            and the configuration does not have an 'embedding_size' attribute.
    """
    super().__init__()
    if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
        raise ValueError(
            f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
            f"heads ({config.num_attention_heads})"
        )

    self.num_attention_heads = config.num_attention_heads
    self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
    self.all_head_size = self.num_attention_heads * self.attention_head_size

    self.q_proj = nn.Linear(config.hidden_size, self.all_head_size)
    self.k_proj = nn.Linear(config.hidden_size, self.all_head_size)
    self.v_proj = nn.Linear(config.hidden_size, self.all_head_size)

    self.dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
    self.position_embedding_type = position_embedding_type or getattr(
        config, "position_embedding_type", "absolute"
    )
    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        self.max_position_embeddings = config.max_position_embeddings
        self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

    self.is_decoder = config.is_decoder

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMSelfAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

Method to forward self-attention mechanism in the MSErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_states

The input hidden states to the self-attention mechanism.

TYPE: Tensor

attention_mask

Mask tensor indicating which positions should be attended to and which should not. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

Mask tensor indicating which heads to mask. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

Hidden states from an encoder in case of cross-attention. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

Mask tensor for encoder_hidden_states. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

Tuple containing the past key and value tensors. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Flag to output attentions. Defaults to False.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the context layer and attention probabilities if output_attentions is True, otherwise just the context layer.

RAISES DESCRIPTION
ValueError

If the position_embedding_type is not 'relative_key' or 'relative_key_query'.

TypeError

If there are issues with the input types or dimensions during the computations.

RuntimeError

If there are runtime issues during the self-attention mechanism.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    Method to forward self-attention mechanism in the MSErnieMSelfAttention class.

    Args:
        self: The instance of the class.
        hidden_states (mindspore.Tensor): The input hidden states to the self-attention mechanism.
        attention_mask (Optional[mindspore.Tensor], optional):
            Mask tensor indicating which positions should be attended to and which should not. Defaults to None.
        head_mask (Optional[mindspore.Tensor], optional):
            Mask tensor indicating which heads to mask. Defaults to None.
        encoder_hidden_states (Optional[mindspore.Tensor], optional):
            Hidden states from an encoder in case of cross-attention. Defaults to None.
        encoder_attention_mask (Optional[mindspore.Tensor], optional): Mask tensor for encoder_hidden_states.
            Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]], optional):
            Tuple containing the past key and value tensors. Defaults to None.
        output_attentions (Optional[bool], optional): Flag to output attentions. Defaults to False.

    Returns:
        Tuple[mindspore.Tensor]:
            A tuple containing the context layer and attention probabilities if output_attentions is True,
            otherwise just the context layer.

    Raises:
        ValueError: If the position_embedding_type is not 'relative_key' or 'relative_key_query'.
        TypeError: If there are issues with the input types or dimensions during the computations.
        RuntimeError: If there are runtime issues during the self-attention mechanism.
    """
    mixed_query_layer = self.q_proj(hidden_states)

    # If this is instantiated as a cross-attention module, the keys
    # and values come from an encoder; the attention mask needs to be
    # such that the encoder's padding tokens are not attended to.
    is_cross_attention = encoder_hidden_states is not None

    if is_cross_attention and past_key_value is not None:
        # reuse k,v, cross_attentions
        key_layer = past_key_value[0]
        value_layer = past_key_value[1]
        attention_mask = encoder_attention_mask
    elif is_cross_attention:
        key_layer = self.transpose_for_scores(self.k_proj(encoder_hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(encoder_hidden_states))
        attention_mask = encoder_attention_mask
    elif past_key_value is not None:
        key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(hidden_states))
        key_layer = ops.cat([past_key_value[0], key_layer], dim=2)
        value_layer = ops.cat([past_key_value[1], value_layer], dim=2)
    else:
        key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(hidden_states))

    query_layer = self.transpose_for_scores(mixed_query_layer)

    use_cache = past_key_value is not None
    if self.is_decoder:
        # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
        # Further calls to cross_attention layer can then reuse all cross-attention
        # key/value_states (first "if" case)
        # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
        # all previous decoder key/value_states. Further calls to uni-directional self-attention
        # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
        # if encoder bi-directional self-attention `past_key_value` is always `None`
        past_key_value = (key_layer, value_layer)

    # Take the dot product between "query" and "key" to get the raw attention scores.
    attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))

    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        query_length, key_length = query_layer.shape[2], key_layer.shape[2]
        if use_cache:
            position_ids_l = mindspore.tensor(key_length - 1, dtype=mindspore.int64).view(
                -1, 1
            )
        else:
            position_ids_l = ops.arange(query_length, dtype=mindspore.int64).view(-1, 1)
        position_ids_r = ops.arange(key_length, dtype=mindspore.int64).view(1, -1)
        distance = position_ids_l - position_ids_r

        positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
        positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

        if self.position_embedding_type == "relative_key":
            relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores
        elif self.position_embedding_type == "relative_key_query":
            relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

    attention_scores = attention_scores / ops.sqrt(ops.scalar_to_tensor(self.attention_head_size, attention_scores.dtype))
    if attention_mask is not None:
        # Apply the attention mask is (precomputed for all layers in ErnieMModel forward() function)
        attention_scores = attention_scores + attention_mask

    # Normalize the attention scores to probabilities.
    attention_probs = ops.softmax(attention_scores, dim=-1)

    # This is actually dropping out entire tokens to attend to, which might
    # seem a bit unusual, but is taken from the original Transformer paper.
    attention_probs = self.dropout(attention_probs)

    # Mask heads if we want to
    if head_mask is not None:
        attention_probs = attention_probs * head_mask

    context_layer = ops.matmul(attention_probs, value_layer)

    context_layer = context_layer.permute(0, 2, 1, 3)
    new_context_layer_shape = context_layer.shape[:-2] + (self.all_head_size,)
    context_layer = context_layer.view(new_context_layer_shape)

    outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

    if self.is_decoder:
        outputs = outputs + (past_key_value,)
    return outputs

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMSelfAttention.transpose_for_scores(x)

Method transposes the input tensor for scores in a self-attention mechanism.

PARAMETER DESCRIPTION
self

An instance of the MSErnieMSelfAttention class.

TYPE: MSErnieMSelfAttention

x

The input tensor to be transposed. It represents the scores to be processed. It is expected to have a shape compatible with the transposition operation.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A new tensor obtained by transposing the input tensor for scores. The shape of the returned tensor is transformed based on the number of attention heads and head size.

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
    """
    Method transposes the input tensor for scores in a self-attention mechanism.

    Args:
        self (MSErnieMSelfAttention): An instance of the MSErnieMSelfAttention class.
        x (mindspore.Tensor): The input tensor to be transposed. It represents the scores to be processed.
            It is expected to have a shape compatible with the transposition operation.

    Returns:
        mindspore.Tensor: A new tensor obtained by transposing the input tensor for scores.
            The shape of the returned tensor is transformed based on the number of attention heads and head size.

    Raises:
        None
    """
    new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
    x = x.view(new_x_shape)
    return x.permute(0, 2, 1, 3)

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSUIEM

Bases: MSErnieMForInformationExtraction

UIEM model

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
class MSUIEM(MSErnieMForInformationExtraction):
    """UIEM model"""
    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
    ) -> Tuple[mindspore.Tensor]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
                not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
                taken into account for computing the loss.
        """
        result = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )
        sequence_output = result[0]

        start_logits = self.linear_start(sequence_output)
        start_logits = start_logits.squeeze(-1)
        start_prob = self.sigmoid(start_logits)
        end_logits = self.linear_end(sequence_output)
        end_logits = end_logits.squeeze(-1)
        end_prob = self.sigmoid(end_logits)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = ops.binary_cross_entropy(start_prob, start_positions)
            end_loss = ops.binary_cross_entropy(end_prob, end_positions)
            total_loss = (start_loss + end_loss) / 2

        output = (start_prob, end_prob) + result[1:]
        return ((total_loss,) + output) if total_loss is not None else output

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSUIEM.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) for computing the start_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

end_positions

Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp\transformers\models\ernie_m\modeling_graph_ernie_m.py
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
) -> Tuple[mindspore.Tensor]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
            not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
            taken into account for computing the loss.
    """
    result = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )
    sequence_output = result[0]

    start_logits = self.linear_start(sequence_output)
    start_logits = start_logits.squeeze(-1)
    start_prob = self.sigmoid(start_logits)
    end_logits = self.linear_end(sequence_output)
    end_logits = end_logits.squeeze(-1)
    end_prob = self.sigmoid(end_logits)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = ops.binary_cross_entropy(start_prob, start_positions)
        end_loss = ops.binary_cross_entropy(end_prob, end_positions)
        total_loss = (start_loss + end_loss) / 2

    output = (start_prob, end_prob) + result[1:]
    return ((total_loss,) + output) if total_loss is not None else output

mindnlp.transformers.models.ernie_m.tokenization_ernie_m

Tokenization classes for Ernie-M.

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer

Bases: PreTrainedTokenizer

Constructs a Ernie-M tokenizer. It uses the sentencepiece tools to cut the words to sub-words.

PARAMETER DESCRIPTION
sentencepiece_model_file

The file path of sentencepiece model.

TYPE: `str`

vocab_file

The file path of the vocabulary.

TYPE: `str`, *optional* DEFAULT: None

do_lower_case

Whether or not to lowercase the input when tokenizing.

TYPE: `str`, *optional*, defaults to `True` DEFAULT: False

unk_token

A special token representing the unknown (out-of-vocabulary) token. An unknown token is set to be unk_token inorder to be converted to an ID.

TYPE: `str`, *optional*, defaults to `"[UNK]"` DEFAULT: '[UNK]'

sep_token

A special token separating two different sentences in the same input.

TYPE: `str`, *optional*, defaults to `"[SEP]"` DEFAULT: '[SEP]'

pad_token

A special token used to make arrays of tokens the same size for batching purposes.

TYPE: `str`, *optional*, defaults to `"[PAD]"` DEFAULT: '[PAD]'

cls_token

A special token used for sequence classification. It is the last token of the sequence when built with special tokens.

TYPE: `str`, *optional*, defaults to `"[CLS]"` DEFAULT: '[CLS]'

mask_token

A special token representing a masked token. This is the token used in the masked language modeling task which the model tries to predict the original unmasked ones.

TYPE: `str`, *optional*, defaults to `"[MASK]"` DEFAULT: '[MASK]'

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
class ErnieMTokenizer(PreTrainedTokenizer):
    r"""
    Constructs a Ernie-M tokenizer. It uses the `sentencepiece` tools to cut the words to sub-words.

    Args:
        sentencepiece_model_file (`str`):
            The file path of sentencepiece model.
        vocab_file (`str`, *optional*):
            The file path of the vocabulary.
        do_lower_case (`str`, *optional*, defaults to `True`):
            Whether or not to lowercase the input when tokenizing.
        unk_token (`str`, *optional*, defaults to `"[UNK]"`):
            A special token representing the `unknown (out-of-vocabulary)` token. An unknown token is set to be
            `unk_token` inorder to be converted to an ID.
        sep_token (`str`, *optional*, defaults to `"[SEP]"`):
            A special token separating two different sentences in the same input.
        pad_token (`str`, *optional*, defaults to `"[PAD]"`):
            A special token used to make arrays of tokens the same size for batching purposes.
        cls_token (`str`, *optional*, defaults to `"[CLS]"`):
            A special token used for sequence classification. It is the last token of the sequence when built with
            special tokens.
        mask_token (`str`, *optional*, defaults to `"[MASK]"`):
            A special token representing a masked token. This is the token used in the masked language modeling task
            which the model tries to predict the original unmasked ones.
    """
    # Ernie-M model doesn't have token_type embedding.
    model_input_names: List[str] = ["input_ids"]

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    resource_files_names = RESOURCE_FILES_NAMES

    def __init__(
        self,
        sentencepiece_model_ckpt,
        vocab_file=None,
        do_lower_case=False,
        encoding="utf8",
        unk_token="[UNK]",
        sep_token="[SEP]",
        pad_token="[PAD]",
        cls_token="[CLS]",
        mask_token="[MASK]",
        sp_model_kwargs: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> None:
        """
        Initialize the ErnieMTokenizer class.

        Args:
            sentencepiece_model_ckpt (str): The path to the sentencepiece model checkpoint file.
            vocab_file (str): The path to the vocabulary file. Defaults to None.
            do_lower_case (bool): A flag indicating whether to convert tokens to lowercase. Defaults to False.
            encoding (str): The character encoding to be used. Defaults to 'utf8'.
            unk_token (str): The token representing unknown words. Defaults to '[UNK]'.
            sep_token (str): The token representing sentence separation. Defaults to '[SEP]'.
            pad_token (str): The token representing padding. Defaults to '[PAD]'.
            cls_token (str): The token representing classification. Defaults to '[CLS]'.
            mask_token (str): The token representing masking. Defaults to '[MASK]'.
            sp_model_kwargs (Optional[Dict[str, Any]]): Additional keyword arguments for the SentencePiece model. Defaults to None.

        Returns:
            None.

        Raises:
            None.
        """
        # Mask token behave like a normal word, i.e. include the space before it and
        # is included in the raw text, there should be a match in a non-normalized sentence.

        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs

        self.do_lower_case = do_lower_case
        self.sentencepiece_model_ckpt = sentencepiece_model_ckpt
        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(sentencepiece_model_ckpt)

        # to mimic paddlenlp.transformers.ernie_m.tokenizer.ErnieMTokenizer functioning
        if vocab_file is not None:
            self.vocab = self.load_vocab(filepath=vocab_file)
        else:
            self.vocab = {self.sp_model.id_to_piece(id): id for id in range(self.sp_model.get_piece_size())}
        self.reverse_vocab = {v: k for k, v in self.vocab.items()}

        super().__init__(
            do_lower_case=do_lower_case,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            vocab_file=vocab_file,
            encoding=encoding,
            sp_model_kwargs=self.sp_model_kwargs,
            **kwargs,
        )

        self.SP_CHAR_MAPPING = {}

        for ch in range(65281, 65375):
            if ch in [ord('~')]:
                self.SP_CHAR_MAPPING[chr(ch)] = chr(ch)
                continue
            self.SP_CHAR_MAPPING[chr(ch)] = chr(ch - 65248)

    def get_offset_mapping(self, text):
        """
        This method is part of the ErnieMTokenizer class and is used to obtain the offset mapping for the given text.

        Args:
            self: The instance of the ErnieMTokenizer class.
            text (str): The input text for which the offset mapping is to be generated.

        Returns:
            None.

        Raises:
            None
        """
        if text is None:
            return None

        split_tokens = self.tokenize(text)
        normalized_text, char_mapping = "", []

        for i, ch in enumerate(text):
            if ch in self.SP_CHAR_MAPPING:
                ch = self.SP_CHAR_MAPPING.get(ch)
            else:
                ch = unicodedata.normalize("NFKC", ch)
            if self.is_whitespace(ch):
                continue
            normalized_text += ch
            char_mapping.extend([i] * len(ch))

        text, token_mapping, offset = normalized_text, [], 0

        if self.do_lower_case:
            text = text.lower()

        for token in split_tokens:
            if token[:1] == "▁":
                token = token[1:]
            start = text[offset:].index(token) + offset
            end = start + len(token)

            token_mapping.append((char_mapping[start], char_mapping[end - 1] + 1))
            offset = end
        return token_mapping

    @property
    def vocab_size(self):
        """
        Method to retrieve the size of the vocabulary stored in the ErnieMTokenizer instance.

        Args:
            self (ErnieMTokenizer): The instance of the ErnieMTokenizer class.
                It represents the tokenizer object containing the vocabulary.

        Returns:
            int: The number of unique tokens in the vocabulary.
                Returns the length of the vocabulary stored in the tokenizer.

        Raises:
            None.
        """
        return len(self.vocab)

    def get_vocab(self):
        """
        Get the vocabulary of the tokenizer.

        Args:
            self: The instance of the ErnieMTokenizer class.

        Returns:
            dict: A dictionary representing the vocabulary of the tokenizer. It contains the original vocabulary 
                along with any added tokens.

        Raises:
            None.
        """
        return dict(self.vocab, **self.added_tokens_encoder)

    def __getstate__(self):
        """
        Method: __getstate__

        Description:
            This method is used to retrieve the state of an instance of the ErnieMTokenizer class.
            It returns a dictionary representing the current state of the instance, with the 'sp_model' attribute set
            to None.

        Args:
            self: An instance of the ErnieMTokenizer class.

        Returns:
            None.

        Raises:
            None.

        """
        state = self.__dict__.copy()
        state["sp_model"] = None
        return state

    def __setstate__(self, d):
        """
        Sets the state of the ErnieMTokenizer object from a serialized state dictionary.

        Args:
            self (ErnieMTokenizer): The instance of the ErnieMTokenizer class.
            d (dict): The serialized state dictionary containing the attributes to be set.

        Returns:
            None.

        Raises:
            None.

        Note:
            This method is automatically called when an ErnieMTokenizer object is loaded from a serialized state.
            It sets the attributes of the object using the values from the serialized state dictionary.

            The 'self.__dict__' attribute is updated with the values from the 'd' dictionary.

            If the 'sp_model_kwargs' attribute is not present in the serialized state, it is initialized as an empty dictionary.

            The SentencePieceProcessor object 'self.sp_model' is initialized using the 'spm.SentencePieceProcessor' class.
            The 'self.sp_model_kwargs' dictionary is passed as keyword arguments to the SentencePieceProcessor forwardor.

            Finally, the sentencepiece model is loaded into the SentencePieceProcessor object using 'self.sentencepiece_model_ckpt'.

            Note that this method assumes the 'spm' module has been imported and is available in the current namespace.
        """
        self.__dict__ = d

        # for backward compatibility
        if not hasattr(self, "sp_model_kwargs"):
            self.sp_model_kwargs = {}

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(self.sentencepiece_model_ckpt)

    def clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        return "".join((self.SP_CHAR_MAPPING.get(c, c) for c in text))

    def _tokenize(self, text, enable_sampling=False, nbest_size=64, alpha=0.1):
        """Tokenize a string."""
        if self.sp_model_kwargs.get("enable_sampling") is True:
            enable_sampling = True
        if self.sp_model_kwargs.get("alpha") is not None:
            alpha = self.sp_model_kwargs.get("alpha")
        if self.sp_model_kwargs.get("nbest_size") is not None:
            nbest_size = self.sp_model_kwargs.get("nbest_size")

        if not enable_sampling:
            pieces = self.sp_model.EncodeAsPieces(text)
        else:
            pieces = self.sp_model.SampleEncodeAsPieces(text, nbest_size, alpha)
        new_pieces = []
        for pi, piece in enumerate(pieces):
            if piece == SPIECE_UNDERLINE:
                if not pieces[pi + 1].startswith(SPIECE_UNDERLINE) and pi != 0:
                    new_pieces.append(SPIECE_UNDERLINE)
                    continue
                continue
            lst_i = 0
            for i, chunk in enumerate(piece):
                if chunk == SPIECE_UNDERLINE:
                    continue
                if self.is_ch_char(chunk) or self.is_punct(chunk):
                    if i > lst_i and piece[lst_i:i] != SPIECE_UNDERLINE:
                        new_pieces.append(piece[lst_i:i])
                    new_pieces.append(chunk)
                    lst_i = i + 1
                elif chunk.isdigit() and i > 0 and not piece[i - 1].isdigit():
                    if i > lst_i and piece[lst_i:i] != SPIECE_UNDERLINE:
                        new_pieces.append(piece[lst_i:i])
                    lst_i = i
                elif not chunk.isdigit() and i > 0 and piece[i - 1].isdigit():
                    if i > lst_i and piece[lst_i:i] != SPIECE_UNDERLINE:
                        new_pieces.append(piece[lst_i:i])
                    lst_i = i
            if len(piece) > lst_i:
                new_pieces.append(piece[lst_i:])
        return new_pieces

    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (strings for sub-words) in a single string."""
        out_string = "".join(tokens).replace(SPIECE_UNDERLINE, " ").strip()
        return out_string

    def convert_ids_to_string(self, ids):
        """
        Converts a sequence of tokens (strings for sub-words) in a single string.
        """
        tokens = self.convert_ids_to_tokens(ids)
        out_string = "".join(tokens).replace(SPIECE_UNDERLINE, " ").strip()
        return out_string

    # to mimic paddlenlp.transformers.ernie_m.tokenizer.ErnieMTokenizer functioning
    def _convert_token_to_id(self, token):
        """
        Converts a token to its corresponding ID using the provided vocabulary in the ErnieMTokenizer class.

        Args:
            self (ErnieMTokenizer): The instance of the ErnieMTokenizer class.
            token (str): The token to be converted to an ID.

        Returns:
            None: This method returns None. The token ID can be obtained via the 'vocab' attribute in the ErnieMTokenizer class.

        Raises:
            KeyError: If the token is not found in the vocabulary and the unknown token (self.unk_token) is also not
                present in the vocabulary.
        """
        return self.vocab.get(token, self.vocab.get(self.unk_token))

    # to mimic paddlenlp.transformers.ernie_m.tokenizer.ErnieMTokenizer functioning
    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.reverse_vocab.get(index, self.unk_token)

    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
        r"""
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An ErnieM sequence has the following format:

        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] [SEP] B [SEP]`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of input_id with the appropriate special tokens.
        """
        if token_ids_1 is None:
            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
        _cls = [self.cls_token_id]
        _sep = [self.sep_token_id]
        return _cls + token_ids_0 + _sep + _sep + token_ids_1 + _sep

    def build_offset_mapping_with_special_tokens(self, offset_mapping_0, offset_mapping_1=None):
        r"""
        Build offset map from a pair of offset map by concatenating and adding offsets of special tokens. An Ernie-M
        offset_mapping has the following format:

        - single sequence: `(0,0) X (0,0)`
        - pair of sequences: `(0,0) A (0,0) (0,0) B (0,0)`

        Args:
            offset_mapping_ids_0 (`List[tuple]`):
                List of char offsets to which the special tokens will be added.
            offset_mapping_ids_1 (`List[tuple]`, *optional*):
                Optional second list of wordpiece offsets for offset mapping pairs.

        Returns:
            `List[tuple]`: List of wordpiece offsets with the appropriate offsets of special tokens.
        """
        if offset_mapping_1 is None:
            return [(0, 0)] + offset_mapping_0 + [(0, 0)]

        return [(0, 0)] + offset_mapping_0 + [(0, 0), (0, 0)] + offset_mapping_1 + [(0, 0)]

    def get_special_tokens_mask(self, token_ids_0, token_ids_1=None, already_has_special_tokens=False):
        r"""
        Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `encode` method.

        Args:
            token_ids_0 (`List[int]`):
                List of ids of the first sequence.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`str`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`:
                The list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            if token_ids_1 is not None:
                raise ValueError(
                    "You should not supply a second sequence if the provided sequence of "
                    "ids is already formatted with special tokens for the model."
                )
            return [1 if x in [self.sep_token_id, self.cls_token_id] else 0 for x in token_ids_0]

        if token_ids_1 is not None:
            return [1] + ([0] * len(token_ids_0)) + [1, 1] + ([0] * len(token_ids_1)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1]

    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Create the token type IDs corresponding to the sequences passed. [What are token type
        IDs?](../glossary#token-type-ids) Should be overridden in a subclass if the model has a special way of
        building: those.

        Args:
            token_ids_0 (`List[int]`):
                The first tokenized sequence.
            token_ids_1 (`List[int]`, *optional*):
                The second tokenized sequence.

        Returns:
            `List[int]`: The token type ids.
        """
        # called when `add_special_tokens` is True, so align with `build_inputs_with_special_tokens` method
        if token_ids_1 is None:
            # [CLS] X [SEP]
            return (len(token_ids_0) + 2) * [0]

        # [CLS] A [SEP] [SEP] B [SEP]
        return [0] * (len(token_ids_0) + 1) + [1] * (len(token_ids_1) + 3)

    def is_ch_char(self, char):
        """
        is_ch_char
        """
        if "\u4e00" <= char <= "\u9fff":
            return True
        return False

    def is_alpha(self, char):
        """
        is_alpha
        """
        if ("a" <= char <= "z") or ("A" <= char <= "Z"):
            return True
        return False

    def is_punct(self, char):
        """
        is_punct
        """
        if char in ",;:.?!~,;:。?!《》【】":
            return True
        return False

    def is_whitespace(self, char):
        """
        is whitespace
        """
        if char in (' ', '\t', '\n', '\r'):
            return True
        if len(char) == 1:
            cat = unicodedata.category(char)
            if cat == "Zs":
                return True
        return False

    def load_vocab(self, filepath):
        """
        This method loads a vocabulary from a specified file path into a token-to-index mapping within the ErnieMTokenizer class.

        Args:
            self (ErnieMTokenizer): The instance of the ErnieMTokenizer class.
            filepath (str): The path to the file containing the vocabulary. The file should be encoded in UTF-8 format.

        Returns:
            dict: A dictionary mapping tokens to their corresponding indices in the loaded vocabulary.

        Raises:
            IOError: If the specified file path is invalid or inaccessible.
            ValueError: If the index conversion to integer fails during token-to-index mapping.
        """
        token_to_idx = {}
        with io.open(filepath, "r", encoding="utf-8") as f:
            for index, line in enumerate(f):
                token = line.rstrip("\n")
                token_to_idx[token] = int(index)

        return token_to_idx

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary and tokenizer model.

        Args:
            self: The instance of the ErnieMTokenizer class.
            save_directory (str): The directory where the vocabulary and tokenizer model will be saved.
            filename_prefix (Optional[str]): The prefix to be added to the filename. Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the file path of the saved vocabulary.

        Raises:
            OSError: If the save_directory does not exist or is not a valid directory.
            IOError: If there is an issue with writing the vocabulary or tokenizer model files.
            Warning: If the vocabulary indices are not consecutive, indicating a potential corruption in the vocabulary.
        """
        index = 0
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
            )
        else:
            vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
        with open(vocab_file, "w", encoding="utf-8") as writer:
            for token, token_index in sorted(self.vocab.items(), key=lambda kv: kv[1]):
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                        " Please check that the vocabulary is not corrupted!"
                    )
                    index = token_index
                writer.write(token + "\n")
                index += 1

        tokenizer_model_file = os.path.join(save_directory, "sentencepiece.bpe.model")
        with open(tokenizer_model_file, "wb") as fi:
            content_spiece_model = self.sp_model.serialized_model_proto()
            fi.write(content_spiece_model)

        return (vocab_file,)

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.vocab_size property

Method to retrieve the size of the vocabulary stored in the ErnieMTokenizer instance.

PARAMETER DESCRIPTION
self

The instance of the ErnieMTokenizer class. It represents the tokenizer object containing the vocabulary.

TYPE: ErnieMTokenizer

RETURNS DESCRIPTION
int

The number of unique tokens in the vocabulary. Returns the length of the vocabulary stored in the tokenizer.

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.__getstate__()

Description

This method is used to retrieve the state of an instance of the ErnieMTokenizer class. It returns a dictionary representing the current state of the instance, with the 'sp_model' attribute set to None.

PARAMETER DESCRIPTION
self

An instance of the ErnieMTokenizer class.

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
def __getstate__(self):
    """
    Method: __getstate__

    Description:
        This method is used to retrieve the state of an instance of the ErnieMTokenizer class.
        It returns a dictionary representing the current state of the instance, with the 'sp_model' attribute set
        to None.

    Args:
        self: An instance of the ErnieMTokenizer class.

    Returns:
        None.

    Raises:
        None.

    """
    state = self.__dict__.copy()
    state["sp_model"] = None
    return state

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.__init__(sentencepiece_model_ckpt, vocab_file=None, do_lower_case=False, encoding='utf8', unk_token='[UNK]', sep_token='[SEP]', pad_token='[PAD]', cls_token='[CLS]', mask_token='[MASK]', sp_model_kwargs=None, **kwargs)

Initialize the ErnieMTokenizer class.

PARAMETER DESCRIPTION
sentencepiece_model_ckpt

The path to the sentencepiece model checkpoint file.

TYPE: str

vocab_file

The path to the vocabulary file. Defaults to None.

TYPE: str DEFAULT: None

do_lower_case

A flag indicating whether to convert tokens to lowercase. Defaults to False.

TYPE: bool DEFAULT: False

encoding

The character encoding to be used. Defaults to 'utf8'.

TYPE: str DEFAULT: 'utf8'

unk_token

The token representing unknown words. Defaults to '[UNK]'.

TYPE: str DEFAULT: '[UNK]'

sep_token

The token representing sentence separation. Defaults to '[SEP]'.

TYPE: str DEFAULT: '[SEP]'

pad_token

The token representing padding. Defaults to '[PAD]'.

TYPE: str DEFAULT: '[PAD]'

cls_token

The token representing classification. Defaults to '[CLS]'.

TYPE: str DEFAULT: '[CLS]'

mask_token

The token representing masking. Defaults to '[MASK]'.

TYPE: str DEFAULT: '[MASK]'

sp_model_kwargs

Additional keyword arguments for the SentencePiece model. Defaults to None.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

RETURNS DESCRIPTION
None

None.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
def __init__(
    self,
    sentencepiece_model_ckpt,
    vocab_file=None,
    do_lower_case=False,
    encoding="utf8",
    unk_token="[UNK]",
    sep_token="[SEP]",
    pad_token="[PAD]",
    cls_token="[CLS]",
    mask_token="[MASK]",
    sp_model_kwargs: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> None:
    """
    Initialize the ErnieMTokenizer class.

    Args:
        sentencepiece_model_ckpt (str): The path to the sentencepiece model checkpoint file.
        vocab_file (str): The path to the vocabulary file. Defaults to None.
        do_lower_case (bool): A flag indicating whether to convert tokens to lowercase. Defaults to False.
        encoding (str): The character encoding to be used. Defaults to 'utf8'.
        unk_token (str): The token representing unknown words. Defaults to '[UNK]'.
        sep_token (str): The token representing sentence separation. Defaults to '[SEP]'.
        pad_token (str): The token representing padding. Defaults to '[PAD]'.
        cls_token (str): The token representing classification. Defaults to '[CLS]'.
        mask_token (str): The token representing masking. Defaults to '[MASK]'.
        sp_model_kwargs (Optional[Dict[str, Any]]): Additional keyword arguments for the SentencePiece model. Defaults to None.

    Returns:
        None.

    Raises:
        None.
    """
    # Mask token behave like a normal word, i.e. include the space before it and
    # is included in the raw text, there should be a match in a non-normalized sentence.

    self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs

    self.do_lower_case = do_lower_case
    self.sentencepiece_model_ckpt = sentencepiece_model_ckpt
    self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
    self.sp_model.Load(sentencepiece_model_ckpt)

    # to mimic paddlenlp.transformers.ernie_m.tokenizer.ErnieMTokenizer functioning
    if vocab_file is not None:
        self.vocab = self.load_vocab(filepath=vocab_file)
    else:
        self.vocab = {self.sp_model.id_to_piece(id): id for id in range(self.sp_model.get_piece_size())}
    self.reverse_vocab = {v: k for k, v in self.vocab.items()}

    super().__init__(
        do_lower_case=do_lower_case,
        unk_token=unk_token,
        sep_token=sep_token,
        pad_token=pad_token,
        cls_token=cls_token,
        mask_token=mask_token,
        vocab_file=vocab_file,
        encoding=encoding,
        sp_model_kwargs=self.sp_model_kwargs,
        **kwargs,
    )

    self.SP_CHAR_MAPPING = {}

    for ch in range(65281, 65375):
        if ch in [ord('~')]:
            self.SP_CHAR_MAPPING[chr(ch)] = chr(ch)
            continue
        self.SP_CHAR_MAPPING[chr(ch)] = chr(ch - 65248)

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.__setstate__(d)

Sets the state of the ErnieMTokenizer object from a serialized state dictionary.

PARAMETER DESCRIPTION
self

The instance of the ErnieMTokenizer class.

TYPE: ErnieMTokenizer

d

The serialized state dictionary containing the attributes to be set.

TYPE: dict

RETURNS DESCRIPTION

None.

Note

This method is automatically called when an ErnieMTokenizer object is loaded from a serialized state. It sets the attributes of the object using the values from the serialized state dictionary.

The 'self.dict' attribute is updated with the values from the 'd' dictionary.

If the 'sp_model_kwargs' attribute is not present in the serialized state, it is initialized as an empty dictionary.

The SentencePieceProcessor object 'self.sp_model' is initialized using the 'spm.SentencePieceProcessor' class. The 'self.sp_model_kwargs' dictionary is passed as keyword arguments to the SentencePieceProcessor forwardor.

Finally, the sentencepiece model is loaded into the SentencePieceProcessor object using 'self.sentencepiece_model_ckpt'.

Note that this method assumes the 'spm' module has been imported and is available in the current namespace.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
def __setstate__(self, d):
    """
    Sets the state of the ErnieMTokenizer object from a serialized state dictionary.

    Args:
        self (ErnieMTokenizer): The instance of the ErnieMTokenizer class.
        d (dict): The serialized state dictionary containing the attributes to be set.

    Returns:
        None.

    Raises:
        None.

    Note:
        This method is automatically called when an ErnieMTokenizer object is loaded from a serialized state.
        It sets the attributes of the object using the values from the serialized state dictionary.

        The 'self.__dict__' attribute is updated with the values from the 'd' dictionary.

        If the 'sp_model_kwargs' attribute is not present in the serialized state, it is initialized as an empty dictionary.

        The SentencePieceProcessor object 'self.sp_model' is initialized using the 'spm.SentencePieceProcessor' class.
        The 'self.sp_model_kwargs' dictionary is passed as keyword arguments to the SentencePieceProcessor forwardor.

        Finally, the sentencepiece model is loaded into the SentencePieceProcessor object using 'self.sentencepiece_model_ckpt'.

        Note that this method assumes the 'spm' module has been imported and is available in the current namespace.
    """
    self.__dict__ = d

    # for backward compatibility
    if not hasattr(self, "sp_model_kwargs"):
        self.sp_model_kwargs = {}

    self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
    self.sp_model.Load(self.sentencepiece_model_ckpt)

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. An ErnieM sequence has the following format:

  • single sequence: [CLS] X [SEP]
  • pair of sequences: [CLS] A [SEP] [SEP] B [SEP]
PARAMETER DESCRIPTION
token_ids_0

List of IDs to which the special tokens will be added.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION

List[int]: List of input_id with the appropriate special tokens.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
    r"""
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. An ErnieM sequence has the following format:

    - single sequence: `[CLS] X [SEP]`
    - pair of sequences: `[CLS] A [SEP] [SEP] B [SEP]`

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of input_id with the appropriate special tokens.
    """
    if token_ids_1 is None:
        return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
    _cls = [self.cls_token_id]
    _sep = [self.sep_token_id]
    return _cls + token_ids_0 + _sep + _sep + token_ids_1 + _sep

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.build_offset_mapping_with_special_tokens(offset_mapping_0, offset_mapping_1=None)

Build offset map from a pair of offset map by concatenating and adding offsets of special tokens. An Ernie-M offset_mapping has the following format:

  • single sequence: (0,0) X (0,0)
  • pair of sequences: (0,0) A (0,0) (0,0) B (0,0)
PARAMETER DESCRIPTION
offset_mapping_ids_0

List of char offsets to which the special tokens will be added.

TYPE: `List[tuple]`

offset_mapping_ids_1

Optional second list of wordpiece offsets for offset mapping pairs.

TYPE: `List[tuple]`, *optional*

RETURNS DESCRIPTION

List[tuple]: List of wordpiece offsets with the appropriate offsets of special tokens.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
def build_offset_mapping_with_special_tokens(self, offset_mapping_0, offset_mapping_1=None):
    r"""
    Build offset map from a pair of offset map by concatenating and adding offsets of special tokens. An Ernie-M
    offset_mapping has the following format:

    - single sequence: `(0,0) X (0,0)`
    - pair of sequences: `(0,0) A (0,0) (0,0) B (0,0)`

    Args:
        offset_mapping_ids_0 (`List[tuple]`):
            List of char offsets to which the special tokens will be added.
        offset_mapping_ids_1 (`List[tuple]`, *optional*):
            Optional second list of wordpiece offsets for offset mapping pairs.

    Returns:
        `List[tuple]`: List of wordpiece offsets with the appropriate offsets of special tokens.
    """
    if offset_mapping_1 is None:
        return [(0, 0)] + offset_mapping_0 + [(0, 0)]

    return [(0, 0)] + offset_mapping_0 + [(0, 0), (0, 0)] + offset_mapping_1 + [(0, 0)]

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.clean_text(text)

Performs invalid character removal and whitespace cleanup on text.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
311
312
313
def clean_text(self, text):
    """Performs invalid character removal and whitespace cleanup on text."""
    return "".join((self.SP_CHAR_MAPPING.get(c, c) for c in text))

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.convert_ids_to_string(ids)

Converts a sequence of tokens (strings for sub-words) in a single string.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
361
362
363
364
365
366
367
def convert_ids_to_string(self, ids):
    """
    Converts a sequence of tokens (strings for sub-words) in a single string.
    """
    tokens = self.convert_ids_to_tokens(ids)
    out_string = "".join(tokens).replace(SPIECE_UNDERLINE, " ").strip()
    return out_string

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.convert_tokens_to_string(tokens)

Converts a sequence of tokens (strings for sub-words) in a single string.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
356
357
358
359
def convert_tokens_to_string(self, tokens):
    """Converts a sequence of tokens (strings for sub-words) in a single string."""
    out_string = "".join(tokens).replace(SPIECE_UNDERLINE, " ").strip()
    return out_string

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)

Create the token type IDs corresponding to the sequences passed. What are token type IDs? Should be overridden in a subclass if the model has a special way of building: those.

PARAMETER DESCRIPTION
token_ids_0

The first tokenized sequence.

TYPE: `List[int]`

token_ids_1

The second tokenized sequence.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: The token type ids.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
def create_token_type_ids_from_sequences(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Create the token type IDs corresponding to the sequences passed. [What are token type
    IDs?](../glossary#token-type-ids) Should be overridden in a subclass if the model has a special way of
    building: those.

    Args:
        token_ids_0 (`List[int]`):
            The first tokenized sequence.
        token_ids_1 (`List[int]`, *optional*):
            The second tokenized sequence.

    Returns:
        `List[int]`: The token type ids.
    """
    # called when `add_special_tokens` is True, so align with `build_inputs_with_special_tokens` method
    if token_ids_1 is None:
        # [CLS] X [SEP]
        return (len(token_ids_0) + 2) * [0]

    # [CLS] A [SEP] [SEP] B [SEP]
    return [0] * (len(token_ids_0) + 1) + [1] * (len(token_ids_1) + 3)

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.get_offset_mapping(text)

This method is part of the ErnieMTokenizer class and is used to obtain the offset mapping for the given text.

PARAMETER DESCRIPTION
self

The instance of the ErnieMTokenizer class.

text

The input text for which the offset mapping is to be generated.

TYPE: str

RETURNS DESCRIPTION

None.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def get_offset_mapping(self, text):
    """
    This method is part of the ErnieMTokenizer class and is used to obtain the offset mapping for the given text.

    Args:
        self: The instance of the ErnieMTokenizer class.
        text (str): The input text for which the offset mapping is to be generated.

    Returns:
        None.

    Raises:
        None
    """
    if text is None:
        return None

    split_tokens = self.tokenize(text)
    normalized_text, char_mapping = "", []

    for i, ch in enumerate(text):
        if ch in self.SP_CHAR_MAPPING:
            ch = self.SP_CHAR_MAPPING.get(ch)
        else:
            ch = unicodedata.normalize("NFKC", ch)
        if self.is_whitespace(ch):
            continue
        normalized_text += ch
        char_mapping.extend([i] * len(ch))

    text, token_mapping, offset = normalized_text, [], 0

    if self.do_lower_case:
        text = text.lower()

    for token in split_tokens:
        if token[:1] == "▁":
            token = token[1:]
        start = text[offset:].index(token) + offset
        end = start + len(token)

        token_mapping.append((char_mapping[start], char_mapping[end - 1] + 1))
        offset = end
    return token_mapping

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)

Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer encode method.

PARAMETER DESCRIPTION
token_ids_0

List of ids of the first sequence.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

already_has_special_tokens

Whether or not the token list is already formatted with special tokens for the model.

TYPE: `str`, *optional*, defaults to `False` DEFAULT: False

RETURNS DESCRIPTION

List[int]: The list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
def get_special_tokens_mask(self, token_ids_0, token_ids_1=None, already_has_special_tokens=False):
    r"""
    Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding
    special tokens using the tokenizer `encode` method.

    Args:
        token_ids_0 (`List[int]`):
            List of ids of the first sequence.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.
        already_has_special_tokens (`str`, *optional*, defaults to `False`):
            Whether or not the token list is already formatted with special tokens for the model.

    Returns:
        `List[int]`:
            The list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
    """
    if already_has_special_tokens:
        if token_ids_1 is not None:
            raise ValueError(
                "You should not supply a second sequence if the provided sequence of "
                "ids is already formatted with special tokens for the model."
            )
        return [1 if x in [self.sep_token_id, self.cls_token_id] else 0 for x in token_ids_0]

    if token_ids_1 is not None:
        return [1] + ([0] * len(token_ids_0)) + [1, 1] + ([0] * len(token_ids_1)) + [1]
    return [1] + ([0] * len(token_ids_0)) + [1]

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.get_vocab()

Get the vocabulary of the tokenizer.

PARAMETER DESCRIPTION
self

The instance of the ErnieMTokenizer class.

RETURNS DESCRIPTION
dict

A dictionary representing the vocabulary of the tokenizer. It contains the original vocabulary along with any added tokens.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
def get_vocab(self):
    """
    Get the vocabulary of the tokenizer.

    Args:
        self: The instance of the ErnieMTokenizer class.

    Returns:
        dict: A dictionary representing the vocabulary of the tokenizer. It contains the original vocabulary 
            along with any added tokens.

    Raises:
        None.
    """
    return dict(self.vocab, **self.added_tokens_encoder)

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.is_alpha(char)

is_alpha

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
499
500
501
502
503
504
505
def is_alpha(self, char):
    """
    is_alpha
    """
    if ("a" <= char <= "z") or ("A" <= char <= "Z"):
        return True
    return False

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.is_ch_char(char)

is_ch_char

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
491
492
493
494
495
496
497
def is_ch_char(self, char):
    """
    is_ch_char
    """
    if "\u4e00" <= char <= "\u9fff":
        return True
    return False

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.is_punct(char)

is_punct

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
507
508
509
510
511
512
513
def is_punct(self, char):
    """
    is_punct
    """
    if char in ",;:.?!~,;:。?!《》【】":
        return True
    return False

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.is_whitespace(char)

is whitespace

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
515
516
517
518
519
520
521
522
523
524
525
def is_whitespace(self, char):
    """
    is whitespace
    """
    if char in (' ', '\t', '\n', '\r'):
        return True
    if len(char) == 1:
        cat = unicodedata.category(char)
        if cat == "Zs":
            return True
    return False

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.load_vocab(filepath)

This method loads a vocabulary from a specified file path into a token-to-index mapping within the ErnieMTokenizer class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMTokenizer class.

TYPE: ErnieMTokenizer

filepath

The path to the file containing the vocabulary. The file should be encoded in UTF-8 format.

TYPE: str

RETURNS DESCRIPTION
dict

A dictionary mapping tokens to their corresponding indices in the loaded vocabulary.

RAISES DESCRIPTION
IOError

If the specified file path is invalid or inaccessible.

ValueError

If the index conversion to integer fails during token-to-index mapping.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
def load_vocab(self, filepath):
    """
    This method loads a vocabulary from a specified file path into a token-to-index mapping within the ErnieMTokenizer class.

    Args:
        self (ErnieMTokenizer): The instance of the ErnieMTokenizer class.
        filepath (str): The path to the file containing the vocabulary. The file should be encoded in UTF-8 format.

    Returns:
        dict: A dictionary mapping tokens to their corresponding indices in the loaded vocabulary.

    Raises:
        IOError: If the specified file path is invalid or inaccessible.
        ValueError: If the index conversion to integer fails during token-to-index mapping.
    """
    token_to_idx = {}
    with io.open(filepath, "r", encoding="utf-8") as f:
        for index, line in enumerate(f):
            token = line.rstrip("\n")
            token_to_idx[token] = int(index)

    return token_to_idx

mindnlp.transformers.models.ernie_m.tokenization_ernie_m.ErnieMTokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary and tokenizer model.

PARAMETER DESCRIPTION
self

The instance of the ErnieMTokenizer class.

save_directory

The directory where the vocabulary and tokenizer model will be saved.

TYPE: str

filename_prefix

The prefix to be added to the filename. Defaults to None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the file path of the saved vocabulary.

RAISES DESCRIPTION
OSError

If the save_directory does not exist or is not a valid directory.

IOError

If there is an issue with writing the vocabulary or tokenizer model files.

Warning

If the vocabulary indices are not consecutive, indicating a potential corruption in the vocabulary.

Source code in mindnlp\transformers\models\ernie_m\tokenization_ernie_m.py
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary and tokenizer model.

    Args:
        self: The instance of the ErnieMTokenizer class.
        save_directory (str): The directory where the vocabulary and tokenizer model will be saved.
        filename_prefix (Optional[str]): The prefix to be added to the filename. Defaults to None.

    Returns:
        Tuple[str]: A tuple containing the file path of the saved vocabulary.

    Raises:
        OSError: If the save_directory does not exist or is not a valid directory.
        IOError: If there is an issue with writing the vocabulary or tokenizer model files.
        Warning: If the vocabulary indices are not consecutive, indicating a potential corruption in the vocabulary.
    """
    index = 0
    if os.path.isdir(save_directory):
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
    else:
        vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
    with open(vocab_file, "w", encoding="utf-8") as writer:
        for token, token_index in sorted(self.vocab.items(), key=lambda kv: kv[1]):
            if index != token_index:
                logger.warning(
                    f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                    " Please check that the vocabulary is not corrupted!"
                )
                index = token_index
            writer.write(token + "\n")
            index += 1

    tokenizer_model_file = os.path.join(save_directory, "sentencepiece.bpe.model")
    with open(tokenizer_model_file, "wb") as fi:
        content_spiece_model = self.sp_model.serialized_model_proto()
        fi.write(content_spiece_model)

    return (vocab_file,)