big_bird

`mindnlp.transformers.models.big_bird.configuration_big_bird.BigBirdConfig` ¶

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [BigBirdModel]. It is used to instantiate an BigBird model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BigBird google/bigbird-roberta-base architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER	DESCRIPTION
`vocab_size`	Vocabulary size of the BigBird model. Defines the number of different tokens that can be represented by the `inputs_ids` passed when calling [`BigBirdModel`]. TYPE: `int`, optional, defaults to 50358 DEFAULT: `50358`
`hidden_size`	Dimension of the encoder layers and the pooler layer. TYPE: `int`, optional, defaults to 768 DEFAULT: `768`
`num_hidden_layers`	Number of hidden layers in the Transformer encoder. TYPE: `int`, optional, defaults to 12 DEFAULT: `12`
`num_attention_heads`	Number of attention heads for each attention layer in the Transformer encoder. TYPE: `int`, optional, defaults to 12 DEFAULT: `12`
`intermediate_size`	Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. TYPE: `int`, optional, defaults to 3072 DEFAULT: `3072`
`hidden_act`	The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported. TYPE: `str` or `function`, optional, defaults to `"gelu_new"` DEFAULT: `'gelu_new'`
`hidden_dropout_prob`	The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. TYPE: `float`, optional, defaults to 0.1 DEFAULT: `0.1`
`attention_probs_dropout_prob`	The dropout ratio for the attention probabilities. TYPE: `float`, optional, defaults to 0.1 DEFAULT: `0.1`
`max_position_embeddings`	The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 1024 or 2048 or 4096). TYPE: `int`, optional, defaults to 4096 DEFAULT: `4096`
`type_vocab_size`	The vocabulary size of the `token_type_ids` passed when calling [`BigBirdModel`]. TYPE: `int`, optional, defaults to 2 DEFAULT: `2`
`initializer_range`	The standard deviation of the truncated_normal_initializer for initializing all weight matrices. TYPE: `float`, optional, defaults to 0.02 DEFAULT: `0.02`
`layer_norm_eps`	The epsilon used by the layer normalization layers. TYPE: `float`, optional, defaults to 1e-12 DEFAULT: `1e-12`
`is_decoder`	Whether the model is used as a decoder or not. If `False`, the model is used as an encoder. TYPE: `bool`, optional, defaults to `False`
`use_cache`	Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if `config.is_decoder=True`. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`classifier_dropout`	The dropout ratio for the classification head. TYPE: `float`, optional DEFAULT: `None`

Example

>>> from transformers import BigBirdConfig, BigBirdModel
...
>>> # Initializing a BigBird google/bigbird-roberta-base style configuration
>>> configuration = BigBirdConfig()
...
>>> # Initializing a model (with random weights) from the google/bigbird-roberta-base style configuration
>>> model = BigBirdModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config

Source code in mindnlp\transformers\models\big_bird\configuration_big_bird.py

class BigBirdConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`BigBirdModel`]. It is used to instantiate an
    BigBird model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the BigBird
    [google/bigbird-roberta-base](https://hf-mirror.com/google/bigbird-roberta-base) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.


    Args:
        vocab_size (`int`, *optional*, defaults to 50358):
            Vocabulary size of the BigBird model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`BigBirdModel`].
        hidden_size (`int`, *optional*, defaults to 768):
            Dimension of the encoder layers and the pooler layer.
        num_hidden_layers (`int`, *optional*, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (`int`, *optional*, defaults to 12):
            Number of attention heads for each attention layer in the Transformer encoder.
        intermediate_size (`int`, *optional*, defaults to 3072):
            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
        hidden_act (`str` or `function`, *optional*, defaults to `"gelu_new"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"selu"` and `"gelu_new"` are supported.
        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout ratio for the attention probabilities.
        max_position_embeddings (`int`, *optional*, defaults to 4096):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 1024 or 2048 or 4096).
        type_vocab_size (`int`, *optional*, defaults to 2):
            The vocabulary size of the `token_type_ids` passed when calling [`BigBirdModel`].
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
        is_decoder (`bool`, *optional*, defaults to `False`):
            Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether or not the model should return the last key/values attentions (not used by all models). Only
            relevant if `config.is_decoder=True`.
        attention_type (`str`, *optional*, defaults to `"block_sparse"`)
            Whether to use block sparse attention (with n complexity) as introduced in paper or original attention
            layer (with n^2 complexity). Possible values are `"original_full"` and `"block_sparse"`.
        use_bias (`bool`, *optional*, defaults to `True`)
            Whether to use bias in query, key, value.
        rescale_embeddings (`bool`, *optional*, defaults to `False`)
            Whether to rescale embeddings with (hidden_size ** 0.5).
        block_size (`int`, *optional*, defaults to 64)
            Size of each block. Useful only when `attention_type == "block_sparse"`.
        num_random_blocks (`int`, *optional*, defaults to 3)
            Each query is going to attend these many number of random blocks. Useful only when `attention_type ==
            "block_sparse"`.
        classifier_dropout (`float`, *optional*):
            The dropout ratio for the classification head.

    Example:
        ```python
        >>> from transformers import BigBirdConfig, BigBirdModel
        ...
        >>> # Initializing a BigBird google/bigbird-roberta-base style configuration
        >>> configuration = BigBirdConfig()
        ...
        >>> # Initializing a model (with random weights) from the google/bigbird-roberta-base style configuration
        >>> model = BigBirdModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "big_bird"

    def __init__(
        self,
        vocab_size=50358,
        hidden_size=768,
        num_hidden_layers=12,
        num_attention_heads=12,
        intermediate_size=3072,
        hidden_act="gelu_new",
        hidden_dropout_prob=0.1,
        attention_probs_dropout_prob=0.1,
        max_position_embeddings=4096,
        type_vocab_size=2,
        initializer_range=0.02,
        layer_norm_eps=1e-12,
        use_cache=True,
        pad_token_id=0,
        bos_token_id=1,
        eos_token_id=2,
        sep_token_id=66,
        attention_type="block_sparse",
        use_bias=True,
        rescale_embeddings=False,
        block_size=64,
        num_random_blocks=3,
        classifier_dropout=None,
        **kwargs,
    ):
        """
        Initializes a new instance of the BigBirdConfig class.

        Args:
            vocab_size (int, optional): The size of the vocabulary. Defaults to 50358.
            hidden_size (int, optional): The size of the hidden layer. Defaults to 768.
            num_hidden_layers (int, optional): The number of hidden layers. Defaults to 12.
            num_attention_heads (int, optional): The number of attention heads. Defaults to 12.
            intermediate_size (int, optional): The size of the intermediate layer in the transformer. Defaults to 3072.
            hidden_act (str, optional): The activation function for the hidden layer. Defaults to 'gelu_new'.
            hidden_dropout_prob (float, optional): The dropout probability for the hidden layer. Defaults to 0.1.
            attention_probs_dropout_prob (float, optional): The dropout probability for the attention probabilities. Defaults to 0.1.
            max_position_embeddings (int, optional): The maximum number of positions for the embeddings. Defaults to 4096.
            type_vocab_size (int, optional): The size of the type vocabulary. Defaults to 2.
            initializer_range (float, optional): The range for the initializer. Defaults to 0.02.
            layer_norm_eps (float, optional): The epsilon value for layer normalization. Defaults to 1e-12.
            use_cache (bool, optional): Whether to use cache in the transformer layers. Defaults to True.
            pad_token_id (int, optional): The token id for padding. Defaults to 0.
            bos_token_id (int, optional): The token id for the beginning of sentence. Defaults to 1.
            eos_token_id (int, optional): The token id for the end of sentence. Defaults to 2.
            sep_token_id (int, optional): The token id for the separator. Defaults to 66.
            attention_type (str, optional): The type of attention mechanism. Defaults to 'block_sparse'.
            use_bias (bool, optional): Whether to use bias in the transformer layers. Defaults to True.
            rescale_embeddings (bool, optional): Whether to rescale the embeddings. Defaults to False.
            block_size (int, optional): The size of each block in block sparse attention. Defaults to 64.
            num_random_blocks (int, optional): The number of random blocks in block sparse attention. Defaults to 3.
            classifier_dropout (float, optional): The dropout probability for the classifier layer. Defaults to None.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            sep_token_id=sep_token_id,
            **kwargs,
        )

        self.vocab_size = vocab_size
        self.max_position_embeddings = max_position_embeddings
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.intermediate_size = intermediate_size
        self.hidden_act = hidden_act
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.initializer_range = initializer_range
        self.type_vocab_size = type_vocab_size
        self.layer_norm_eps = layer_norm_eps
        self.use_cache = use_cache

        self.rescale_embeddings = rescale_embeddings
        self.attention_type = attention_type
        self.use_bias = use_bias
        self.block_size = block_size
        self.num_random_blocks = num_random_blocks
        self.classifier_dropout = classifier_dropout

mindnlp.transformers.models.big_bird.configuration_big_bird.BigBirdConfig.init(vocab_size=50358, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu_new', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=4096, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, use_cache=True, pad_token_id=0, bos_token_id=1, eos_token_id=2, sep_token_id=66, attention_type='block_sparse', use_bias=True, rescale_embeddings=False, block_size=64, num_random_blocks=3, classifier_dropout=None, **kwargs) ¶

Initializes a new instance of the BigBirdConfig class.

PARAMETER	DESCRIPTION
`vocab_size`	The size of the vocabulary. Defaults to 50358. TYPE: `int` DEFAULT: `50358`
`hidden_size`	The size of the hidden layer. Defaults to 768. TYPE: `int` DEFAULT: `768`
`num_hidden_layers`	The number of hidden layers. Defaults to 12. TYPE: `int` DEFAULT: `12`
`num_attention_heads`	The number of attention heads. Defaults to 12. TYPE: `int` DEFAULT: `12`
`intermediate_size`	The size of the intermediate layer in the transformer. Defaults to 3072. TYPE: `int` DEFAULT: `3072`
`hidden_act`	The activation function for the hidden layer. Defaults to 'gelu_new'. TYPE: `str` DEFAULT: `'gelu_new'`
`hidden_dropout_prob`	The dropout probability for the hidden layer. Defaults to 0.1. TYPE: `float` DEFAULT: `0.1`
`attention_probs_dropout_prob`	The dropout probability for the attention probabilities. Defaults to 0.1. TYPE: `float` DEFAULT: `0.1`
`max_position_embeddings`	The maximum number of positions for the embeddings. Defaults to 4096. TYPE: `int` DEFAULT: `4096`
`type_vocab_size`	The size of the type vocabulary. Defaults to 2. TYPE: `int` DEFAULT: `2`
`initializer_range`	The range for the initializer. Defaults to 0.02. TYPE: `float` DEFAULT: `0.02`
`layer_norm_eps`	The epsilon value for layer normalization. Defaults to 1e-12. TYPE: `float` DEFAULT: `1e-12`
`use_cache`	Whether to use cache in the transformer layers. Defaults to True. TYPE: `bool` DEFAULT: `True`
`pad_token_id`	The token id for padding. Defaults to 0. TYPE: `int` DEFAULT: `0`
`bos_token_id`	The token id for the beginning of sentence. Defaults to 1. TYPE: `int` DEFAULT: `1`
`eos_token_id`	The token id for the end of sentence. Defaults to 2. TYPE: `int` DEFAULT: `2`
`sep_token_id`	The token id for the separator. Defaults to 66. TYPE: `int` DEFAULT: `66`
`attention_type`	The type of attention mechanism. Defaults to 'block_sparse'. TYPE: `str` DEFAULT: `'block_sparse'`
`use_bias`	Whether to use bias in the transformer layers. Defaults to True. TYPE: `bool` DEFAULT: `True`
`rescale_embeddings`	Whether to rescale the embeddings. Defaults to False. TYPE: `bool` DEFAULT: `False`
`block_size`	The size of each block in block sparse attention. Defaults to 64. TYPE: `int` DEFAULT: `64`
`num_random_blocks`	The number of random blocks in block sparse attention. Defaults to 3. TYPE: `int` DEFAULT: `3`
`classifier_dropout`	The dropout probability for the classifier layer. Defaults to None. TYPE: `float` DEFAULT: `None`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\big_bird\configuration_big_bird.py

def __init__(
    self,
    vocab_size=50358,
    hidden_size=768,
    num_hidden_layers=12,
    num_attention_heads=12,
    intermediate_size=3072,
    hidden_act="gelu_new",
    hidden_dropout_prob=0.1,
    attention_probs_dropout_prob=0.1,
    max_position_embeddings=4096,
    type_vocab_size=2,
    initializer_range=0.02,
    layer_norm_eps=1e-12,
    use_cache=True,
    pad_token_id=0,
    bos_token_id=1,
    eos_token_id=2,
    sep_token_id=66,
    attention_type="block_sparse",
    use_bias=True,
    rescale_embeddings=False,
    block_size=64,
    num_random_blocks=3,
    classifier_dropout=None,
    **kwargs,
):
    """
    Initializes a new instance of the BigBirdConfig class.

    Args:
        vocab_size (int, optional): The size of the vocabulary. Defaults to 50358.
        hidden_size (int, optional): The size of the hidden layer. Defaults to 768.
        num_hidden_layers (int, optional): The number of hidden layers. Defaults to 12.
        num_attention_heads (int, optional): The number of attention heads. Defaults to 12.
        intermediate_size (int, optional): The size of the intermediate layer in the transformer. Defaults to 3072.
        hidden_act (str, optional): The activation function for the hidden layer. Defaults to 'gelu_new'.
        hidden_dropout_prob (float, optional): The dropout probability for the hidden layer. Defaults to 0.1.
        attention_probs_dropout_prob (float, optional): The dropout probability for the attention probabilities. Defaults to 0.1.
        max_position_embeddings (int, optional): The maximum number of positions for the embeddings. Defaults to 4096.
        type_vocab_size (int, optional): The size of the type vocabulary. Defaults to 2.
        initializer_range (float, optional): The range for the initializer. Defaults to 0.02.
        layer_norm_eps (float, optional): The epsilon value for layer normalization. Defaults to 1e-12.
        use_cache (bool, optional): Whether to use cache in the transformer layers. Defaults to True.
        pad_token_id (int, optional): The token id for padding. Defaults to 0.
        bos_token_id (int, optional): The token id for the beginning of sentence. Defaults to 1.
        eos_token_id (int, optional): The token id for the end of sentence. Defaults to 2.
        sep_token_id (int, optional): The token id for the separator. Defaults to 66.
        attention_type (str, optional): The type of attention mechanism. Defaults to 'block_sparse'.
        use_bias (bool, optional): Whether to use bias in the transformer layers. Defaults to True.
        rescale_embeddings (bool, optional): Whether to rescale the embeddings. Defaults to False.
        block_size (int, optional): The size of each block in block sparse attention. Defaults to 64.
        num_random_blocks (int, optional): The number of random blocks in block sparse attention. Defaults to 3.
        classifier_dropout (float, optional): The dropout probability for the classifier layer. Defaults to None.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(
        pad_token_id=pad_token_id,
        bos_token_id=bos_token_id,
        eos_token_id=eos_token_id,
        sep_token_id=sep_token_id,
        **kwargs,
    )

    self.vocab_size = vocab_size
    self.max_position_embeddings = max_position_embeddings
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.intermediate_size = intermediate_size
    self.hidden_act = hidden_act
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.initializer_range = initializer_range
    self.type_vocab_size = type_vocab_size
    self.layer_norm_eps = layer_norm_eps
    self.use_cache = use_cache

    self.rescale_embeddings = rescale_embeddings
    self.attention_type = attention_type
    self.use_bias = use_bias
    self.block_size = block_size
    self.num_random_blocks = num_random_blocks
    self.classifier_dropout = classifier_dropout

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForCausalLM` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForCausalLM(BigBirdPreTrainedModel):
    _tied_weights_keys = ["cls.predictions.decoder.weight", "cls.predictions.decoder.bias"]

    def __init__(self, config):
        super().__init__(config)

        if not config.is_decoder:
            logger.warning("If you want to use `BigBirdForCausalLM` as a standalone, add `is_decoder=True.`")

        self.bert = BigBirdModel(config)
        self.cls = BigBirdOnlyMLMHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self):
        return self.cls.predictions.decoder

    def set_output_embeddings(self, new_embeddings):
        self.cls.predictions.decoder = new_embeddings
        self.cls.predictions.bias = new_embeddings.bias

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[CausalLMOutputWithCrossAttentions, Tuple[mindspore.Tensor]]:
        r"""
        encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
            the model is configured as a decoder.
        encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
            the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.
        past_key_values (`tuple(tuple(mindspore.Tensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
            If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
            don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
            `decoder_input_ids` of shape `(batch_size, sequence_length)`.
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in
            `[-100, 0, ..., config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are
            ignored (masked), the loss is only computed for the tokens with labels n `[0, ..., config.vocab_size]`.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
            `past_key_values`).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            past_key_values=past_key_values,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]
        prediction_scores = self.cls(sequence_output)

        lm_loss = None
        if labels is not None:
            # we are doing next-token prediction; shift prediction scores and input ids by one
            shifted_prediction_scores = prediction_scores[:, :-1, :]
            labels = labels[:, 1:]
            loss_fct = CrossEntropyLoss()
            lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (prediction_scores,) + outputs[2:]
            return ((lm_loss,) + output) if lm_loss is not None else output

        return CausalLMOutputWithCrossAttentions(
            loss=lm_loss,
            logits=prediction_scores,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
            cross_attentions=outputs.cross_attentions,
        )

    def prepare_inputs_for_generation(self, input_ids, past_key_values=None, attention_mask=None, **model_kwargs):
        # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
        if attention_mask is None:
            attention_mask = ops.ones_like(input_ids)

        # cut decoder_input_ids if past_key_values is used
        if past_key_values is not None:
            past_length = past_key_values[0][0].shape[2]

            # Some generation methods already pass only the last input ID
            if input_ids.shape[1] > past_length:
                remove_prefix_length = past_length
            else:
                # Default to old behavior: keep only final ID
                remove_prefix_length = input_ids.shape[1] - 1

            input_ids = input_ids[:, remove_prefix_length:]

        return {"input_ids": input_ids, "attention_mask": attention_mask, "past_key_values": past_key_values}

    def _reorder_cache(self, past_key_values, beam_idx):
        reordered_past = ()
        for layer_past in past_key_values:
            reordered_past += (
                tuple(past_state.index_select(0, beam_idx) for past_state in layer_past[:2])
                + layer_past[2:],
            )
        return reordered_past

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForCausalLM.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_values=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

encoder_hidden_states (mindspore.Tensor of shape (batch_size, sequence_length, hidden_size), optional): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. encoder_attention_mask (mindspore.Tensor of shape (batch_size, sequence_length), optional): Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.

past_key_values (tuple(tuple(mindspore.Tensor)) of length config.n_layers with each tuple having 4 tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)): Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length). labels (mindspore.Tensor of shape (batch_size, sequence_length), optional): Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels n [0, ..., config.vocab_size]. use_cache (bool, optional): If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[CausalLMOutputWithCrossAttentions, Tuple[mindspore.Tensor]]:
    r"""
    encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
        Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
        the model is configured as a decoder.
    encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
        Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
        the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:

        - 1 for tokens that are **not masked**,
        - 0 for tokens that are **masked**.
    past_key_values (`tuple(tuple(mindspore.Tensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
        Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
        If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
        don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
        `decoder_input_ids` of shape `(batch_size, sequence_length)`.
    labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
        Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in
        `[-100, 0, ..., config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are
        ignored (masked), the loss is only computed for the tokens with labels n `[0, ..., config.vocab_size]`.
    use_cache (`bool`, *optional*):
        If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
        `past_key_values`).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        encoder_hidden_states=encoder_hidden_states,
        encoder_attention_mask=encoder_attention_mask,
        past_key_values=past_key_values,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]
    prediction_scores = self.cls(sequence_output)

    lm_loss = None
    if labels is not None:
        # we are doing next-token prediction; shift prediction scores and input ids by one
        shifted_prediction_scores = prediction_scores[:, :-1, :]
        labels = labels[:, 1:]
        loss_fct = CrossEntropyLoss()
        lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

    if not return_dict:
        output = (prediction_scores,) + outputs[2:]
        return ((lm_loss,) + output) if lm_loss is not None else output

    return CausalLMOutputWithCrossAttentions(
        loss=lm_loss,
        logits=prediction_scores,
        past_key_values=outputs.past_key_values,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
        cross_attentions=outputs.cross_attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMaskedLM` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForMaskedLM(BigBirdPreTrainedModel):
    _tied_weights_keys = ["cls.predictions.decoder.weight", "cls.predictions.decoder.bias"]

    def __init__(self, config):
        super().__init__(config)

        if config.is_decoder:
            logger.warning(
                "If you want to use `BigBirdForMaskedLM` make sure `config.is_decoder=False` for "
                "bi-directional self-attention."
            )

        self.bert = BigBirdModel(config)
        self.cls = BigBirdOnlyMLMHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self):
        return self.cls.predictions.decoder

    def set_output_embeddings(self, new_embeddings):
        self.cls.predictions.decoder = new_embeddings
        self.cls.predictions.bias = new_embeddings.bias

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[MaskedLMOutput, Tuple[mindspore.Tensor]]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

        Returns:

        Example:

        ```python
        >>> import torch
        >>> from transformers import AutoTokenizer, BigBirdForMaskedLM
        >>> from datasets import load_dataset

        >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
        >>> model = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")
        >>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

        >>> # select random long article
        >>> LONG_ARTICLE_TARGET = squad_ds[81514]["context"]
        >>> # select random sentence
        >>> LONG_ARTICLE_TARGET[332:398]
        'the highest values are very close to the theoretical maximum value'

        >>> # add mask_token
        >>> LONG_ARTICLE_TO_MASK = LONG_ARTICLE_TARGET.replace("maximum", "[MASK]")
        >>> inputs = tokenizer(LONG_ARTICLE_TO_MASK, return_tensors="pt")
        >>> # long article input
        >>> list(inputs["input_ids"].shape)
        [1, 919]

        >>> with no_grad():
        ...     logits = model(**inputs).logits
        >>> # retrieve index of [MASK]
        >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
        >>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
        >>> tokenizer.decode(predicted_token_id)
        'maximum'
        ```

        ```python
        >>> labels = tokenizer(LONG_ARTICLE_TARGET, return_tensors="pt")["input_ids"]
        >>> labels = ops.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
        >>> outputs = model(**inputs, labels=labels)
        >>> round(outputs.loss.item(), 2)
        1.99
        ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]
        prediction_scores = self.cls(sequence_output)

        masked_lm_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()  # -100 index = padding token
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (prediction_scores,) + outputs[2:]
            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

        return MaskedLMOutput(
            loss=masked_lm_loss,
            logits=prediction_scores,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

    def prepare_inputs_for_generation(self, input_ids, attention_mask=None, **model_kwargs):
        input_shape = input_ids.shape
        effective_batch_size = input_shape[0]

        #  add a dummy token
        if self.config.pad_token_id is None:
            raise ValueError("The PAD token should be defined for generation")
        attention_mask = ops.cat([attention_mask, ops.zeros((attention_mask.shape[0], 1), dtype=attention_mask.dtype)], dim=-1)
        dummy_token = ops.full(
            (effective_batch_size, 1), self.config.pad_token_id, dtype=mindspore.int64
        )
        input_ids = ops.cat([input_ids, dummy_token], dim=1)

        return {"input_ids": input_ids, "attention_mask": attention_mask}

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMaskedLM.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, encoder_hidden_states=None, encoder_attention_mask=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

labels (mindspore.Tensor of shape (batch_size, sequence_length), optional): Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].

Returns:

Example:

>>> import torch
>>> from transformers import AutoTokenizer, BigBirdForMaskedLM
>>> from datasets import load_dataset

>>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
>>> model = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")
>>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

>>> # select random long article
>>> LONG_ARTICLE_TARGET = squad_ds[81514]["context"]
>>> # select random sentence
>>> LONG_ARTICLE_TARGET[332:398]
'the highest values are very close to the theoretical maximum value'

>>> # add mask_token
>>> LONG_ARTICLE_TO_MASK = LONG_ARTICLE_TARGET.replace("maximum", "[MASK]")
>>> inputs = tokenizer(LONG_ARTICLE_TO_MASK, return_tensors="pt")
>>> # long article input
>>> list(inputs["input_ids"].shape)
[1, 919]

>>> with no_grad():
...     logits = model(**inputs).logits
>>> # retrieve index of [MASK]
>>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
>>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
>>> tokenizer.decode(predicted_token_id)
'maximum'

>>> labels = tokenizer(LONG_ARTICLE_TARGET, return_tensors="pt")["input_ids"]
>>> labels = ops.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
>>> outputs = model(**inputs, labels=labels)
>>> round(outputs.loss.item(), 2)
1.99

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[MaskedLMOutput, Tuple[mindspore.Tensor]]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
        Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
        config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
        loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

    Returns:

    Example:

    ```python
    >>> import torch
    >>> from transformers import AutoTokenizer, BigBirdForMaskedLM
    >>> from datasets import load_dataset

    >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
    >>> model = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")
    >>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

    >>> # select random long article
    >>> LONG_ARTICLE_TARGET = squad_ds[81514]["context"]
    >>> # select random sentence
    >>> LONG_ARTICLE_TARGET[332:398]
    'the highest values are very close to the theoretical maximum value'

    >>> # add mask_token
    >>> LONG_ARTICLE_TO_MASK = LONG_ARTICLE_TARGET.replace("maximum", "[MASK]")
    >>> inputs = tokenizer(LONG_ARTICLE_TO_MASK, return_tensors="pt")
    >>> # long article input
    >>> list(inputs["input_ids"].shape)
    [1, 919]

    >>> with no_grad():
    ...     logits = model(**inputs).logits
    >>> # retrieve index of [MASK]
    >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
    >>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
    >>> tokenizer.decode(predicted_token_id)
    'maximum'
    ```

    ```python
    >>> labels = tokenizer(LONG_ARTICLE_TARGET, return_tensors="pt")["input_ids"]
    >>> labels = ops.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
    >>> outputs = model(**inputs, labels=labels)
    >>> round(outputs.loss.item(), 2)
    1.99
    ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        encoder_hidden_states=encoder_hidden_states,
        encoder_attention_mask=encoder_attention_mask,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]
    prediction_scores = self.cls(sequence_output)

    masked_lm_loss = None
    if labels is not None:
        loss_fct = CrossEntropyLoss()  # -100 index = padding token
        masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

    if not return_dict:
        output = (prediction_scores,) + outputs[2:]
        return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

    return MaskedLMOutput(
        loss=masked_lm_loss,
        logits=prediction_scores,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMultipleChoice` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForMultipleChoice(BigBirdPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.bert = BigBirdModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, 1)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[MultipleChoiceModelOutput, Tuple[mindspore.Tensor]]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
            `input_ids` above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
        position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMultipleChoice.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

labels (mindspore.Tensor of shape (batch_size,), optional): Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[MultipleChoiceModelOutput, Tuple[mindspore.Tensor]]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
        num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
        `input_ids` above)
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
    position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(reshaped_logits, labels)

    if not return_dict:
        output = (reshaped_logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return MultipleChoiceModelOutput(
        loss=loss,
        logits=reshaped_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForPreTraining` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForPreTraining(BigBirdPreTrainedModel):
    _tied_weights_keys = ["cls.predictions.decoder.weight", "cls.predictions.decoder.bias"]

    def __init__(self, config):
        super().__init__(config)

        self.bert = BigBirdModel(config, add_pooling_layer=True)
        self.cls = BigBirdPreTrainingHeads(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self):
        return self.cls.predictions.decoder

    def set_output_embeddings(self, new_embeddings):
        self.cls.predictions.decoder = new_embeddings
        self.cls.predictions.bias = new_embeddings.bias

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        next_sentence_label: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[BigBirdForPreTrainingOutput, Tuple[mindspore.Tensor]]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
        next_sentence_label (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the next sequence prediction (classification) loss. If specified, nsp loss will be
            added to masked_lm loss. Input should be a sequence pair (see `input_ids` docstring) Indices should be in
            `[0, 1]`:

            - 0 indicates sequence B is a continuation of sequence A,
            - 1 indicates sequence B is a random sequence.
        kwargs (`Dict[str, any]`, *optional*, defaults to `{}`):
            Used to hide legacy arguments that have been deprecated.

        Returns:

        Example:

        ```python
        >>> from transformers import AutoTokenizer, BigBirdForPreTraining
        >>> import torch

        >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
        >>> model = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")

        >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
        >>> outputs = model(**inputs)

        >>> prediction_logits = outputs.prediction_logits
        >>> seq_relationship_logits = outputs.seq_relationship_logits
        ```"""
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output, pooled_output = outputs[:2]
        prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)

        total_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            total_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        if next_sentence_label is not None and total_loss is not None:
            next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
            total_loss = total_loss + next_sentence_loss

        if not return_dict:
            output = (prediction_scores, seq_relationship_score) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return BigBirdForPreTrainingOutput(
            loss=total_loss,
            prediction_logits=prediction_scores,
            seq_relationship_logits=seq_relationship_score,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForPreTraining.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, next_sentence_label=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

labels (mindspore.Tensor of shape (batch_size, sequence_length), optional): Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size] next_sentence_label (mindspore.Tensor of shape (batch_size,), optional): Labels for computing the next sequence prediction (classification) loss. If specified, nsp loss will be added to masked_lm loss. Input should be a sequence pair (see input_ids docstring) Indices should be in [0, 1]:

- 0 indicates sequence B is a continuation of sequence A,
- 1 indicates sequence B is a random sequence.

kwargs (Dict[str, any], optional, defaults to {}): Used to hide legacy arguments that have been deprecated.

Returns:

Example:

>>> from transformers import AutoTokenizer, BigBirdForPreTraining
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
>>> model = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> prediction_logits = outputs.prediction_logits
>>> seq_relationship_logits = outputs.seq_relationship_logits

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    next_sentence_label: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[BigBirdForPreTrainingOutput, Tuple[mindspore.Tensor]]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
        Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
        config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
        loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
    next_sentence_label (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for computing the next sequence prediction (classification) loss. If specified, nsp loss will be
        added to masked_lm loss. Input should be a sequence pair (see `input_ids` docstring) Indices should be in
        `[0, 1]`:

        - 0 indicates sequence B is a continuation of sequence A,
        - 1 indicates sequence B is a random sequence.
    kwargs (`Dict[str, any]`, *optional*, defaults to `{}`):
        Used to hide legacy arguments that have been deprecated.

    Returns:

    Example:

    ```python
    >>> from transformers import AutoTokenizer, BigBirdForPreTraining
    >>> import torch

    >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
    >>> model = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")

    >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
    >>> outputs = model(**inputs)

    >>> prediction_logits = outputs.prediction_logits
    >>> seq_relationship_logits = outputs.seq_relationship_logits
    ```"""
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output, pooled_output = outputs[:2]
    prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)

    total_loss = None
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        total_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

    if next_sentence_label is not None and total_loss is not None:
        next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
        total_loss = total_loss + next_sentence_loss

    if not return_dict:
        output = (prediction_scores, seq_relationship_score) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

    return BigBirdForPreTrainingOutput(
        loss=total_loss,
        prediction_logits=prediction_scores,
        seq_relationship_logits=seq_relationship_score,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForQuestionAnswering` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForQuestionAnswering(BigBirdPreTrainedModel):
    def __init__(self, config, add_pooling_layer=False):
        super().__init__(config)

        config.num_labels = 2
        self.num_labels = config.num_labels
        self.sep_token_id = config.sep_token_id

        self.bert = BigBirdModel(config, add_pooling_layer=add_pooling_layer)
        self.qa_classifier = BigBirdForQuestionAnsweringHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        question_lengths: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[BigBirdForQuestionAnsweringModelOutput, Tuple[mindspore.Tensor]]:
        r"""
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.

        Returns:

        Example:

        ```python
        >>> import torch
        >>> from transformers import AutoTokenizer, BigBirdForQuestionAnswering
        >>> from datasets import load_dataset

        >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
        >>> model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-roberta-base")
        >>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

        >>> # select random article and question
        >>> LONG_ARTICLE = squad_ds[81514]["context"]
        >>> QUESTION = squad_ds[81514]["question"]
        >>> QUESTION
        'During daytime how high can the temperatures reach?'

        >>> inputs = tokenizer(QUESTION, LONG_ARTICLE, return_tensors="pt")
        >>> # long article and question input
        >>> list(inputs["input_ids"].shape)
        [1, 929]

        >>> with no_grad():
        ...     outputs = model(**inputs)

        >>> answer_start_index = outputs.start_logits.argmax()
        >>> answer_end_index = outputs.end_logits.argmax()
        >>> predict_answer_token_ids = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
        >>> predict_answer_token = tokenizer.decode(predict_answer_token_ids)
        ```

        ```python
        >>> target_start_index, target_end_index = mindspore.tensor([130]), mindspore.tensor([132])
        >>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
        >>> loss = outputs.loss
        ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        seqlen = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        if question_lengths is None and input_ids is not None:
            # assuming input_ids format: <cls> <question> <sep> context <sep>
            question_lengths = ops.argmax(input_ids.eq(self.sep_token_id).int(), dim=-1) + 1
            question_lengths = question_lengths.unsqueeze(1)

        logits_mask = None
        if question_lengths is not None:
            # setting lengths logits to `-inf`
            logits_mask = self.prepare_question_mask(question_lengths, seqlen)
            if token_type_ids is None:
                token_type_ids = ops.ones(logits_mask.shape, dtype=mindspore.int64) - logits_mask
            logits_mask[:, 0] = False
            logits_mask = logits_mask.unsqueeze(2)

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]
        logits = self.qa_classifier(sequence_output)

        if logits_mask is not None:
            # removing question tokens from the competition
            logits = logits - logits_mask * 1e6

        start_logits, end_logits = ops.split(logits, 1, dim=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
            start_loss = loss_fct(start_logits, start_positions)
            end_loss = loss_fct(end_logits, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return BigBirdForQuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            pooler_output=outputs.pooler_output,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

    @staticmethod
    def prepare_question_mask(q_lengths: mindspore.Tensor, maxlen: int):
        # q_lengths -> (bz, 1)
        mask = ops.arange(0, maxlen)
        mask = mask.unsqueeze(0)  # -> (1, maxlen)
        mask = ops.where(mask < q_lengths, 1, 0)
        return mask

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForQuestionAnswering.forward(input_ids=None, attention_mask=None, question_lengths=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

start_positions (mindspore.Tensor of shape (batch_size,), optional): Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss. end_positions (mindspore.Tensor of shape (batch_size,), optional): Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

Returns:

Example:

>>> import torch
>>> from transformers import AutoTokenizer, BigBirdForQuestionAnswering
>>> from datasets import load_dataset

>>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
>>> model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-roberta-base")
>>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

>>> # select random article and question
>>> LONG_ARTICLE = squad_ds[81514]["context"]
>>> QUESTION = squad_ds[81514]["question"]
>>> QUESTION
'During daytime how high can the temperatures reach?'

>>> inputs = tokenizer(QUESTION, LONG_ARTICLE, return_tensors="pt")
>>> # long article and question input
>>> list(inputs["input_ids"].shape)
[1, 929]

>>> with no_grad():
...     outputs = model(**inputs)

>>> answer_start_index = outputs.start_logits.argmax()
>>> answer_end_index = outputs.end_logits.argmax()
>>> predict_answer_token_ids = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
>>> predict_answer_token = tokenizer.decode(predict_answer_token_ids)

>>> target_start_index, target_end_index = mindspore.tensor([130]), mindspore.tensor([132])
>>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
>>> loss = outputs.loss

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    question_lengths: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[BigBirdForQuestionAnsweringModelOutput, Tuple[mindspore.Tensor]]:
    r"""
    start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for position (index) of the start of the labelled span for computing the token classification loss.
        Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
        are not taken into account for computing the loss.
    end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for position (index) of the end of the labelled span for computing the token classification loss.
        Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
        are not taken into account for computing the loss.

    Returns:

    Example:

    ```python
    >>> import torch
    >>> from transformers import AutoTokenizer, BigBirdForQuestionAnswering
    >>> from datasets import load_dataset

    >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")
    >>> model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-roberta-base")
    >>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

    >>> # select random article and question
    >>> LONG_ARTICLE = squad_ds[81514]["context"]
    >>> QUESTION = squad_ds[81514]["question"]
    >>> QUESTION
    'During daytime how high can the temperatures reach?'

    >>> inputs = tokenizer(QUESTION, LONG_ARTICLE, return_tensors="pt")
    >>> # long article and question input
    >>> list(inputs["input_ids"].shape)
    [1, 929]

    >>> with no_grad():
    ...     outputs = model(**inputs)

    >>> answer_start_index = outputs.start_logits.argmax()
    >>> answer_end_index = outputs.end_logits.argmax()
    >>> predict_answer_token_ids = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
    >>> predict_answer_token = tokenizer.decode(predict_answer_token_ids)
    ```

    ```python
    >>> target_start_index, target_end_index = mindspore.tensor([130]), mindspore.tensor([132])
    >>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
    >>> loss = outputs.loss
    ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    seqlen = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    if question_lengths is None and input_ids is not None:
        # assuming input_ids format: <cls> <question> <sep> context <sep>
        question_lengths = ops.argmax(input_ids.eq(self.sep_token_id).int(), dim=-1) + 1
        question_lengths = question_lengths.unsqueeze(1)

    logits_mask = None
    if question_lengths is not None:
        # setting lengths logits to `-inf`
        logits_mask = self.prepare_question_mask(question_lengths, seqlen)
        if token_type_ids is None:
            token_type_ids = ops.ones(logits_mask.shape, dtype=mindspore.int64) - logits_mask
        logits_mask[:, 0] = False
        logits_mask = logits_mask.unsqueeze(2)

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]
    logits = self.qa_classifier(sequence_output)

    if logits_mask is not None:
        # removing question tokens from the competition
        logits = logits - logits_mask * 1e6

    start_logits, end_logits = ops.split(logits, 1, dim=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
        start_loss = loss_fct(start_logits, start_positions)
        end_loss = loss_fct(end_logits, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (start_logits, end_logits) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

    return BigBirdForQuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        pooler_output=outputs.pooler_output,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForSequenceClassification(BigBirdPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config
        self.bert = BigBirdModel(config)
        self.classifier = BigBirdClassificationHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[SequenceClassifierOutput, Tuple[mindspore.Tensor]]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).

        Returns:

        Example:

        ```python
        >>> import torch
        >>> from transformers import AutoTokenizer, BigBirdForSequenceClassification
        >>> from datasets import load_dataset

        >>> tokenizer = AutoTokenizer.from_pretrained("l-yohai/bigbird-roberta-base-mnli")
        >>> model = BigBirdForSequenceClassification.from_pretrained("l-yohai/bigbird-roberta-base-mnli")
        >>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

        >>> LONG_ARTICLE = squad_ds[81514]["context"]
        >>> inputs = tokenizer(LONG_ARTICLE, return_tensors="pt")
        >>> # long input article
        >>> list(inputs["input_ids"].shape)
        [1, 919]

        >>> with no_grad():
        ...     logits = model(**inputs).logits
        >>> predicted_class_id = logits.argmax().item()
        >>> model.config.id2label[predicted_class_id]
        'LABEL_0'
        ```

        ```python
        >>> num_labels = len(model.config.id2label)
        >>> model = BigBirdForSequenceClassification.from_pretrained(
        ...     "l-yohai/bigbird-roberta-base-mnli", num_labels=num_labels
        ... )
        >>> labels = mindspore.tensor(1)
        >>> loss = model(**inputs, labels=labels).loss
        >>> round(loss.item(), 2)
        1.13
        ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

labels (mindspore.Tensor of shape (batch_size,), optional): Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Returns:

Example:

>>> import torch
>>> from transformers import AutoTokenizer, BigBirdForSequenceClassification
>>> from datasets import load_dataset

>>> tokenizer = AutoTokenizer.from_pretrained("l-yohai/bigbird-roberta-base-mnli")
>>> model = BigBirdForSequenceClassification.from_pretrained("l-yohai/bigbird-roberta-base-mnli")
>>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

>>> LONG_ARTICLE = squad_ds[81514]["context"]
>>> inputs = tokenizer(LONG_ARTICLE, return_tensors="pt")
>>> # long input article
>>> list(inputs["input_ids"].shape)
[1, 919]

>>> with no_grad():
...     logits = model(**inputs).logits
>>> predicted_class_id = logits.argmax().item()
>>> model.config.id2label[predicted_class_id]
'LABEL_0'

>>> num_labels = len(model.config.id2label)
>>> model = BigBirdForSequenceClassification.from_pretrained(
...     "l-yohai/bigbird-roberta-base-mnli", num_labels=num_labels
... )
>>> labels = mindspore.tensor(1)
>>> loss = model(**inputs, labels=labels).loss
>>> round(loss.item(), 2)
1.13

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[SequenceClassifierOutput, Tuple[mindspore.Tensor]]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
        config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
        `config.num_labels > 1` a classification loss is computed (Cross-Entropy).

    Returns:

    Example:

    ```python
    >>> import torch
    >>> from transformers import AutoTokenizer, BigBirdForSequenceClassification
    >>> from datasets import load_dataset

    >>> tokenizer = AutoTokenizer.from_pretrained("l-yohai/bigbird-roberta-base-mnli")
    >>> model = BigBirdForSequenceClassification.from_pretrained("l-yohai/bigbird-roberta-base-mnli")
    >>> squad_ds = load_dataset("rajpurkar/squad_v2", split="train")  # doctest: +IGNORE_RESULT

    >>> LONG_ARTICLE = squad_ds[81514]["context"]
    >>> inputs = tokenizer(LONG_ARTICLE, return_tensors="pt")
    >>> # long input article
    >>> list(inputs["input_ids"].shape)
    [1, 919]

    >>> with no_grad():
    ...     logits = model(**inputs).logits
    >>> predicted_class_id = logits.argmax().item()
    >>> model.config.id2label[predicted_class_id]
    'LABEL_0'
    ```

    ```python
    >>> num_labels = len(model.config.id2label)
    >>> model = BigBirdForSequenceClassification.from_pretrained(
    ...     "l-yohai/bigbird-roberta-base-mnli", num_labels=num_labels
    ... )
    >>> labels = mindspore.tensor(1)
    >>> loss = model(**inputs, labels=labels).loss
    >>> round(loss.item(), 2)
    1.13
    ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            loss_fct = MSELoss()
            if self.num_labels == 1:
                loss = loss_fct(logits.squeeze(), labels.squeeze())
            else:
                loss = loss_fct(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss_fct = BCEWithLogitsLoss()
            loss = loss_fct(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForTokenClassification` ¶

Bases: BigBirdPreTrainedModel

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdForTokenClassification(BigBirdPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels

        self.bert = BigBirdModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[TokenClassifierOutput, Tuple[mindspore.Tensor]]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForTokenClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

labels (mindspore.Tensor of shape (batch_size, sequence_length), optional): Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[TokenClassifierOutput, Tuple[mindspore.Tensor]]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
        Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.bert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdLayer` ¶

Bases: Module

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdLayer(nn.Module):
    def __init__(self, config, seed=None):
        super().__init__()
        self.config = config
        self.attention_type = config.attention_type
        self.chunk_size_feed_forward = config.chunk_size_feed_forward
        self.seq_len_dim = 1
        self.attention = BigBirdAttention(config, seed=seed)
        self.is_decoder = config.is_decoder
        self.add_cross_attention = config.add_cross_attention
        if self.add_cross_attention:
            if not self.is_decoder:
                raise TypeError(f"{self} should be used as a decoder model if cross attention is added")
            self.crossattention = BigBirdAttention(config)
        self.intermediate = BigBirdIntermediate(config)
        self.output = BigBirdOutput(config)

    def set_attention_type(self, value: str):
        if value not in ["original_full", "block_sparse"]:
            raise ValueError(
                f"attention_type can only be set to either 'original_full' or 'block_sparse', but is {value}"
            )
        # attention type is already correctly set
        if value == self.attention_type:
            return
        self.attention_type = value
        self.attention.set_attention_type(value)

        if self.add_cross_attention:
            self.crossattention.set_attention_type(value)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        band_mask=None,
        from_mask=None,
        to_mask=None,
        blocked_encoder_mask=None,
        past_key_value=None,
        output_attentions=False,
    ):
        # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
        self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
        self_attention_outputs = self.attention(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            past_key_value=self_attn_past_key_value,
            output_attentions=output_attentions,
            band_mask=band_mask,
            from_mask=from_mask,
            to_mask=to_mask,
            from_blocked_mask=blocked_encoder_mask,
            to_blocked_mask=blocked_encoder_mask,
        )
        attention_output = self_attention_outputs[0]

        # if decoder, the last output is tuple of self-attn cache
        if self.is_decoder:
            outputs = self_attention_outputs[1:-1]
            present_key_value = self_attention_outputs[-1]
        else:
            outputs = self_attention_outputs[1:]  # add self attentions if we output attention weights

        cross_attn_present_key_value = None
        if self.is_decoder and encoder_hidden_states is not None:
            if not hasattr(self, "crossattention"):
                raise ValueError(
                    f"If `encoder_hidden_states` are passed, {self} has to be instantiated with                    "
                    " cross-attention layers by setting `config.add_cross_attention=True`"
                )

            # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
            cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
            cross_attention_outputs = self.crossattention(
                attention_output,
                attention_mask,
                head_mask,
                encoder_hidden_states,
                encoder_attention_mask,
                cross_attn_past_key_value,
                output_attentions,
            )
            attention_output = cross_attention_outputs[0]
            outputs = outputs + cross_attention_outputs[1:-1]  # add cross attentions if we output attention weights

            # add cross-attn cache to positions 3,4 of present_key_value tuple
            cross_attn_present_key_value = cross_attention_outputs[-1]
            present_key_value = present_key_value + cross_attn_present_key_value

        layer_output = apply_chunking_to_forward(
            self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output
        )

        outputs = (layer_output,) + outputs

        # if decoder, return the attn key/values as the last output
        if self.is_decoder:
            outputs = outputs + (present_key_value,)

        return outputs

    def feed_forward_chunk(self, attention_output):
        intermediate_output = self.intermediate(attention_output)
        layer_output = self.output(intermediate_output, attention_output)
        return layer_output

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdModel` ¶

Bases: BigBirdPreTrainedModel

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument and add_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdModel(BigBirdPreTrainedModel):
    """

    The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
    cross-attention is added between the self-attention layers, following the architecture described in [Attention is
    all you need](https://arxiv.org/abs/1706.03762) by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
    Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

    To behave as an decoder the model needs to be initialized with the `is_decoder` argument of the configuration set
    to `True`. To be used in a Seq2Seq model, the model needs to initialized with both `is_decoder` argument and
    `add_cross_attention` set to `True`; an `encoder_hidden_states` is then expected as an input to the forward pass.
    """

    def __init__(self, config, add_pooling_layer=True):
        super().__init__(config)
        self.attention_type = self.config.attention_type
        self.config = config

        self.block_size = self.config.block_size

        self.embeddings = BigBirdEmbeddings(config)
        self.encoder = BigBirdEncoder(config)

        if add_pooling_layer:
            self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
            self.activation = nn.Tanh()
        else:
            self.pooler = None
            self.activation = None

        if self.attention_type != "original_full" and config.add_cross_attention:
            logger.warning(
                "When using `BigBirdForCausalLM` as decoder, then `attention_type` must be `original_full`. Setting"
                " `attention_type=original_full`"
            )
            self.set_attention_type("original_full")

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        self.embeddings.word_embeddings = value

    def set_attention_type(self, value: str):
        if value not in ["original_full", "block_sparse"]:
            raise ValueError(
                f"attention_type can only be set to either 'original_full' or 'block_sparse', but is {value}"
            )
        # attention type is already correctly set
        if value == self.attention_type:
            return
        self.attention_type = value
        self.encoder.set_attention_type(value)

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[BaseModelOutputWithPoolingAndCrossAttentions, Tuple[mindspore.Tensor]]:
        r"""
        encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
            the model is configured as a decoder.
        encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
            the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.
        past_key_values (`tuple(tuple(mindspore.Tensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
            If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
            don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
            `decoder_input_ids` of shape `(batch_size, sequence_length)`.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
            `past_key_values`).
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if self.config.is_decoder:
            use_cache = use_cache if use_cache is not None else self.config.use_cache
        else:
            use_cache = False

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        elif input_ids is not None:
            self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
            input_shape = input_ids.shape
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.shape[:-1]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        batch_size, seq_length = input_shape

        # past_key_values_length
        past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0

        if attention_mask is None:
            attention_mask = ops.ones(((batch_size, seq_length + past_key_values_length)))
        if token_type_ids is None:
            if hasattr(self.embeddings, "token_type_ids"):
                buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
                buffered_token_type_ids_expanded = buffered_token_type_ids.broadcast_to((batch_size, seq_length))
                token_type_ids = buffered_token_type_ids_expanded
            else:
                token_type_ids = ops.zeros(input_shape, dtype=mindspore.int64)

        # in order to use block_sparse attention, sequence_length has to be at least
        # bigger than all global attentions: 2 * block_size
        # + sliding tokens: 3 * block_size
        # + random tokens: 2 * num_random_blocks * block_size
        max_tokens_to_attend = (5 + 2 * self.config.num_random_blocks) * self.config.block_size
        if self.attention_type == "block_sparse" and seq_length <= max_tokens_to_attend:
            # change attention_type from block_sparse to original_full
            sequence_length = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
            logger.warning(
                "Attention type 'block_sparse' is not possible if sequence_length: "
                f"{sequence_length} <= num global tokens: 2 * config.block_size "
                "+ min. num sliding tokens: 3 * config.block_size "
                "+ config.num_random_blocks * config.block_size "
                "+ additional buffer: config.num_random_blocks * config.block_size "
                f"= {max_tokens_to_attend} with config.block_size "
                f"= {self.config.block_size}, config.num_random_blocks "
                f"= {self.config.num_random_blocks}. "
                "Changing attention type to 'original_full'..."
            )
            self.set_attention_type("original_full")

        if self.attention_type == "block_sparse":
            (
                padding_len,
                input_ids,
                attention_mask,
                token_type_ids,
                position_ids,
                inputs_embeds,
            ) = self._pad_to_block_size(
                input_ids=input_ids,
                attention_mask=attention_mask,
                token_type_ids=token_type_ids,
                position_ids=position_ids,
                inputs_embeds=inputs_embeds,
                pad_token_id=self.config.pad_token_id,
            )
        else:
            padding_len = 0

        if self.attention_type == "block_sparse":
            blocked_encoder_mask, band_mask, from_mask, to_mask = self.create_masks_for_block_sparse_attn(
                attention_mask, self.block_size
            )
            extended_attention_mask = None

        elif self.attention_type == "original_full":
            blocked_encoder_mask = None
            band_mask = None
            from_mask = None
            to_mask = None
            # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
            # ourselves in which case we just need to make it broadcastable to all heads.
            extended_attention_mask: mindspore.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)
        else:
            raise ValueError(
                f"attention_type can either be original_full or block_sparse, but is {self.attention_type}"
            )

        # If a 2D or 3D attention mask is provided for the cross-attention
        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
        if self.config.is_decoder and encoder_hidden_states is not None:
            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.shape
            encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
            if encoder_attention_mask is None:
                encoder_attention_mask = ops.ones(encoder_hidden_shape)
            encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
        else:
            encoder_extended_attention_mask = None

        # Prepare head mask if needed
        # 1.0 in head_mask indicate we keep the head
        # attention_probs has shape bsz x n_heads x N x N
        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        embedding_output = self.embeddings(
            input_ids=input_ids,
            position_ids=position_ids,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            past_key_values_length=past_key_values_length,
        )

        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_extended_attention_mask,
            past_key_values=past_key_values,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            band_mask=band_mask,
            from_mask=from_mask,
            to_mask=to_mask,
            blocked_encoder_mask=blocked_encoder_mask,
            return_dict=return_dict,
        )
        sequence_output = encoder_outputs[0]

        pooler_output = self.activation(self.pooler(sequence_output[:, 0, :])) if (self.pooler is not None) else None

        # undo padding
        if padding_len > 0:
            # unpad `sequence_output` because the calling function is expecting a length == input_ids.shape[1]
            sequence_output = sequence_output[:, :-padding_len]

        if not return_dict:
            return (sequence_output, pooler_output) + encoder_outputs[1:]

        return BaseModelOutputWithPoolingAndCrossAttentions(
            last_hidden_state=sequence_output,
            pooler_output=pooler_output,
            past_key_values=encoder_outputs.past_key_values,
            hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
            cross_attentions=encoder_outputs.cross_attentions,
        )

    @staticmethod
    def create_masks_for_block_sparse_attn(attention_mask: mindspore.Tensor, block_size: int):
        batch_size, seq_length = attention_mask.shape
        if seq_length % block_size != 0:
            raise ValueError(
                f"Sequence length must be multiple of block size, but sequence length is {seq_length}, while block"
                f" size is {block_size}."
            )

        def create_band_mask_from_inputs(from_blocked_mask, to_blocked_mask):
            """
            Create 3D attention mask from a 2D tensor mask.

            Args:
                from_blocked_mask: 2D Tensor of shape [batch_size,
                from_seq_length//from_block_size, from_block_size].
                to_blocked_mask: int32 Tensor of shape [batch_size,
                to_seq_length//to_block_size, to_block_size].

            Returns:
                float Tensor of shape [batch_size, 1, from_seq_length//from_block_size-4, from_block_size,
                3*to_block_size].
            """
            exp_blocked_to_pad = ops.cat(
                [to_blocked_mask[:, 1:-3], to_blocked_mask[:, 2:-2], to_blocked_mask[:, 3:-1]], dim=2
            )
            band_mask = ops.einsum("blq,blk->blqk", from_blocked_mask[:, 2:-2], exp_blocked_to_pad)
            band_mask = band_mask.unsqueeze(1)
            return band_mask

        blocked_encoder_mask = attention_mask.view(batch_size, seq_length // block_size, block_size)
        band_mask = create_band_mask_from_inputs(blocked_encoder_mask, blocked_encoder_mask)

        from_mask = attention_mask.view(batch_size, 1, seq_length, 1)
        to_mask = attention_mask.view(batch_size, 1, 1, seq_length)

        return blocked_encoder_mask, band_mask, from_mask, to_mask

    def _pad_to_block_size(
        self,
        input_ids: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        token_type_ids: mindspore.Tensor,
        position_ids: mindspore.Tensor,
        inputs_embeds: mindspore.Tensor,
        pad_token_id: int,
    ):
        """A helper function to pad tokens and mask to work with implementation of BigBird block-sparse attention."""
        # padding
        block_size = self.config.block_size

        input_shape = input_ids.shape if input_ids is not None else inputs_embeds.shape
        batch_size, seq_len = input_shape[:2]

        padding_len = (block_size - seq_len % block_size) % block_size
        if padding_len > 0:
            logger.warning_once(
                f"Input ids are automatically padded from {seq_len} to {seq_len + padding_len} to be a multiple of "
                f"`config.block_size`: {block_size}"
            )
            if input_ids is not None:
                input_ids = nn.functional.pad(input_ids, (0, padding_len), value=pad_token_id)
            if position_ids is not None:
                # pad with position_id = pad_token_id as in modeling_bigbird.BigBirdEmbeddings
                position_ids = nn.functional.pad(position_ids, (0, padding_len), value=pad_token_id)
            if inputs_embeds is not None:
                input_ids_padding = ops.full(
                    (batch_size, padding_len),
                    self.config.pad_token_id,
                    dtype=mindspore.int64,
                )
                inputs_embeds_padding = self.embeddings(input_ids_padding)
                inputs_embeds = ops.cat([inputs_embeds, inputs_embeds_padding], dim=-2)

            attention_mask = nn.functional.pad(
                attention_mask, (0, padding_len), value=False
            )  # no attention on the padding tokens
            token_type_ids = nn.functional.pad(token_type_ids, (0, padding_len), value=0)  # pad with token_type_id = 0

        return padding_len, input_ids, attention_mask, token_type_ids, position_ids, inputs_embeds

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdModel.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_values=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

encoder_hidden_states (mindspore.Tensor of shape (batch_size, sequence_length, hidden_size), optional): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. encoder_attention_mask (mindspore.Tensor of shape (batch_size, sequence_length), optional): Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.

past_key_values (tuple(tuple(mindspore.Tensor)) of length config.n_layers with each tuple having 4 tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)): Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length). use_cache (bool, optional): If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[BaseModelOutputWithPoolingAndCrossAttentions, Tuple[mindspore.Tensor]]:
    r"""
    encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
        Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
        the model is configured as a decoder.
    encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
        Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
        the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:

        - 1 for tokens that are **not masked**,
        - 0 for tokens that are **masked**.
    past_key_values (`tuple(tuple(mindspore.Tensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
        Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
        If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
        don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
        `decoder_input_ids` of shape `(batch_size, sequence_length)`.
    use_cache (`bool`, *optional*):
        If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
        `past_key_values`).
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if self.config.is_decoder:
        use_cache = use_cache if use_cache is not None else self.config.use_cache
    else:
        use_cache = False

    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    elif input_ids is not None:
        self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
        input_shape = input_ids.shape
    elif inputs_embeds is not None:
        input_shape = inputs_embeds.shape[:-1]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    batch_size, seq_length = input_shape

    # past_key_values_length
    past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0

    if attention_mask is None:
        attention_mask = ops.ones(((batch_size, seq_length + past_key_values_length)))
    if token_type_ids is None:
        if hasattr(self.embeddings, "token_type_ids"):
            buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
            buffered_token_type_ids_expanded = buffered_token_type_ids.broadcast_to((batch_size, seq_length))
            token_type_ids = buffered_token_type_ids_expanded
        else:
            token_type_ids = ops.zeros(input_shape, dtype=mindspore.int64)

    # in order to use block_sparse attention, sequence_length has to be at least
    # bigger than all global attentions: 2 * block_size
    # + sliding tokens: 3 * block_size
    # + random tokens: 2 * num_random_blocks * block_size
    max_tokens_to_attend = (5 + 2 * self.config.num_random_blocks) * self.config.block_size
    if self.attention_type == "block_sparse" and seq_length <= max_tokens_to_attend:
        # change attention_type from block_sparse to original_full
        sequence_length = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
        logger.warning(
            "Attention type 'block_sparse' is not possible if sequence_length: "
            f"{sequence_length} <= num global tokens: 2 * config.block_size "
            "+ min. num sliding tokens: 3 * config.block_size "
            "+ config.num_random_blocks * config.block_size "
            "+ additional buffer: config.num_random_blocks * config.block_size "
            f"= {max_tokens_to_attend} with config.block_size "
            f"= {self.config.block_size}, config.num_random_blocks "
            f"= {self.config.num_random_blocks}. "
            "Changing attention type to 'original_full'..."
        )
        self.set_attention_type("original_full")

    if self.attention_type == "block_sparse":
        (
            padding_len,
            input_ids,
            attention_mask,
            token_type_ids,
            position_ids,
            inputs_embeds,
        ) = self._pad_to_block_size(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            inputs_embeds=inputs_embeds,
            pad_token_id=self.config.pad_token_id,
        )
    else:
        padding_len = 0

    if self.attention_type == "block_sparse":
        blocked_encoder_mask, band_mask, from_mask, to_mask = self.create_masks_for_block_sparse_attn(
            attention_mask, self.block_size
        )
        extended_attention_mask = None

    elif self.attention_type == "original_full":
        blocked_encoder_mask = None
        band_mask = None
        from_mask = None
        to_mask = None
        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
        extended_attention_mask: mindspore.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)
    else:
        raise ValueError(
            f"attention_type can either be original_full or block_sparse, but is {self.attention_type}"
        )

    # If a 2D or 3D attention mask is provided for the cross-attention
    # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
    if self.config.is_decoder and encoder_hidden_states is not None:
        encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.shape
        encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
        if encoder_attention_mask is None:
            encoder_attention_mask = ops.ones(encoder_hidden_shape)
        encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
    else:
        encoder_extended_attention_mask = None

    # Prepare head mask if needed
    # 1.0 in head_mask indicate we keep the head
    # attention_probs has shape bsz x n_heads x N x N
    # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
    # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
    head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

    embedding_output = self.embeddings(
        input_ids=input_ids,
        position_ids=position_ids,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        past_key_values_length=past_key_values_length,
    )

    encoder_outputs = self.encoder(
        embedding_output,
        attention_mask=extended_attention_mask,
        head_mask=head_mask,
        encoder_hidden_states=encoder_hidden_states,
        encoder_attention_mask=encoder_extended_attention_mask,
        past_key_values=past_key_values,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        band_mask=band_mask,
        from_mask=from_mask,
        to_mask=to_mask,
        blocked_encoder_mask=blocked_encoder_mask,
        return_dict=return_dict,
    )
    sequence_output = encoder_outputs[0]

    pooler_output = self.activation(self.pooler(sequence_output[:, 0, :])) if (self.pooler is not None) else None

    # undo padding
    if padding_len > 0:
        # unpad `sequence_output` because the calling function is expecting a length == input_ids.shape[1]
        sequence_output = sequence_output[:, :-padding_len]

    if not return_dict:
        return (sequence_output, pooler_output) + encoder_outputs[1:]

    return BaseModelOutputWithPoolingAndCrossAttentions(
        last_hidden_state=sequence_output,
        pooler_output=pooler_output,
        past_key_values=encoder_outputs.past_key_values,
        hidden_states=encoder_outputs.hidden_states,
        attentions=encoder_outputs.attentions,
        cross_attentions=encoder_outputs.cross_attentions,
    )

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdPreTrainedModel` ¶

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp\transformers\models\big_bird\modeling_big_bird.py

class BigBirdPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = BigBirdConfig
    base_model_prefix = "bert"
    supports_gradient_checkpointing = True

    def _init_weights(self, module):
        """Initialize the weights"""
        if isinstance(module, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            nn.init.normal_(module.weight, mean=0.0, std=self.config.initializer_range)
            if module.bias is not None:
                nn.init.zeros_(module.bias)
        elif isinstance(module, nn.Embedding):
            nn.init.normal_(module.weight, mean=0.0, std=self.config.initializer_range)
            if module.padding_idx is not None:
                module.weight[module.padding_idx] = 0
        elif isinstance(module, nn.LayerNorm):
            nn.init.zeros_(module.bias)
            nn.init.ones_(module.weight)

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer` ¶

Bases: PreTrainedTokenizer

Construct a BigBird tokenizer. Based on SentencePiece.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER	DESCRIPTION
`vocab_file`	SentencePiece file (generally has a .spm extension) that contains the vocabulary necessary to instantiate a tokenizer. TYPE: `str`
`unk_token`	The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. TYPE: `str`, optional, defaults to `"<unk>"` DEFAULT: `'<unk>'`
`bos_token`	The begin of sequence token. TYPE: `str`, optional, defaults to `"<s>"` DEFAULT: `'<s>'`
`eos_token`	The end of sequence token. TYPE: `str`, optional, defaults to `"</s>"` DEFAULT: `'</s>'`
`pad_token`	The token used for padding, for example when batching sequences of different lengths. TYPE: `str`, optional, defaults to `"<pad>"` DEFAULT: `'<pad>'`
`sep_token`	The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. TYPE: `str`, optional, defaults to `"[SEP]"` DEFAULT: `'[SEP]'`
`mask_token`	The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. TYPE: `str`, optional, defaults to `"[MASK]"` DEFAULT: `'[MASK]'`
`cls_token`	The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. TYPE: `str`, optional, defaults to `"[CLS]"` DEFAULT: `'[CLS]'`
`sp_model_kwargs`	Will be passed to the `SentencePieceProcessor.__init__()` method. The Python wrapper for SentencePiece can be used, among other things, to set: `enable_sampling`: Enable subword regularization. `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout. `nbest_size = {0,1}`: No sampling is performed. `nbest_size > 1`: samples from the nbest_size results. `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm. `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout. TYPE: `dict`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

class BigBirdTokenizer(PreTrainedTokenizer):
    """
    Construct a BigBird tokenizer. Based on [SentencePiece](https://github.com/google/sentencepiece).

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The begin of sequence token.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        sep_token (`str`, *optional*, defaults to `"[SEP]"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        mask_token (`str`, *optional*, defaults to `"[MASK]"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        cls_token (`str`, *optional*, defaults to `"[CLS]"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        sp_model_kwargs (`dict`, *optional*):
            Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
            SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
            to set:

            - `enable_sampling`: Enable subword regularization.
            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

                - `nbest_size = {0,1}`: No sampling is performed.
                - `nbest_size > 1`: samples from the nbest_size results.
                - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                using forward-filtering-and-backward-sampling algorithm.
            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
            BPE-dropout.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ["input_ids", "attention_mask"]
    prefix_tokens: List[int] = []

    def __init__(
        self,
        vocab_file,
        unk_token="<unk>",
        bos_token="<s>",
        eos_token="</s>",
        pad_token="<pad>",
        sep_token="[SEP]",
        mask_token="[MASK]",
        cls_token="[CLS]",
        sp_model_kwargs: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> None:
        """
        Initializes an instance of the BigBirdTokenizer class.

        Args:
            self: The instance of the BigBirdTokenizer class.
            vocab_file (str): Path to the vocabulary file.
            unk_token (str, optional): The token representing unknown words. Defaults to '<unk>'.
            bos_token (str, optional): The token representing the beginning of a sentence. Defaults to '<s>'.
            eos_token (str, optional): The token representing the end of a sentence. Defaults to '</s>'.
            pad_token (str, optional): The token representing padding. Defaults to '<pad>'.
            sep_token (str, optional): The token representing sentence separation. Defaults to '[SEP]'.
            mask_token (str, optional): The token representing masked words. Defaults to '[MASK]'.
            cls_token (str, optional): The token representing classification. Defaults to '[CLS]'.
            sp_model_kwargs (Optional[Dict[str, Any]], optional): Additional arguments for the SentencePieceProcessor. Defaults to None.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            None.
        """
        bos_token = (
            AddedToken(bos_token, lstrip=False, rstrip=False)
            if isinstance(bos_token, str)
            else bos_token
        )
        eos_token = (
            AddedToken(eos_token, lstrip=False, rstrip=False)
            if isinstance(eos_token, str)
            else eos_token
        )
        unk_token = (
            AddedToken(unk_token, lstrip=False, rstrip=False)
            if isinstance(unk_token, str)
            else unk_token
        )
        pad_token = (
            AddedToken(pad_token, lstrip=False, rstrip=False)
            if isinstance(pad_token, str)
            else pad_token
        )
        cls_token = (
            AddedToken(cls_token, lstrip=False, rstrip=False)
            if isinstance(cls_token, str)
            else cls_token
        )
        sep_token = (
            AddedToken(sep_token, lstrip=False, rstrip=False)
            if isinstance(sep_token, str)
            else sep_token
        )

        # Mask token behave like a normal word, i.e. include the space before it
        mask_token = (
            AddedToken(mask_token, lstrip=True, rstrip=False)
            if isinstance(mask_token, str)
            else mask_token
        )

        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs

        self.vocab_file = vocab_file

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(vocab_file)

        super().__init__(
            bos_token=bos_token,
            eos_token=eos_token,
            unk_token=unk_token,
            pad_token=pad_token,
            sep_token=sep_token,
            mask_token=mask_token,
            cls_token=cls_token,
            sp_model_kwargs=self.sp_model_kwargs,
            **kwargs,
        )

    @property
    def vocab_size(self):
        """
        Method to retrieve the vocabulary size of the BigBirdTokenizer.

        Args:
            self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.
                This parameter is required to access the tokenizer's properties.

        Returns:
            None: The method returns the vocabulary size as an integer value.

        Raises:
            None.
        """
        return self.sp_model.get_piece_size()

    def get_vocab(self):
        """
        This method returns the vocabulary for the BigBirdTokenizer.

        Args:
            self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.

        Returns:
            dict: A dictionary containing the vocabulary, where keys are tokens and values are their corresponding ids.

        Raises:
            None
        """
        vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
        vocab.update(self.added_tokens_encoder)
        return vocab

    def __getstate__(self):
        """
        The '__getstate__' method in the 'BigBirdTokenizer' class is used to retrieve the current state of the object
        for serialization. This method takes one parameter, 'self', which refers to the instance of
        the 'BigBirdTokenizer' class.

        Args:
            self (BigBirdTokenizer): The instance of the 'BigBirdTokenizer' class.

        Returns:
            None.

        Raises:
            None.
        """
        state = self.__dict__.copy()
        state["sp_model"] = None
        return state

    def __setstate__(self, d):
        """
        Sets the state of the BigBirdTokenizer object based on the provided dictionary.

        Args:
            self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.
            d (dict): The dictionary containing the state information.

        Returns:
            None

        Raises:
            None

        This method sets the state of the BigBirdTokenizer object by assigning the dictionary 'd' to the '__dict__' attribute of the instance.
        If the instance does not have the 'sp_model_kwargs' attribute, it is initialized as an empty dictionary.
        The SentencePieceProcessor object 'sp_model' is then created and assigned to the 'sp_model' attribute of the instance.
        The 'sp_model_kwargs' dictionary is used to pass any additional keyword arguments to the SentencePieceProcessor initialization.
        Finally, the vocabulary file is loaded using the 'Load' method of the 'sp_model' object.
        """
        self.__dict__ = d

        # for backward compatibility
        if not hasattr(self, "sp_model_kwargs"):
            self.sp_model_kwargs = {}

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(self.vocab_file)

    def _tokenize(self, text: str) -> List[str]:
        """Take as input a string and return a list of strings (tokens) for words/sub-words"""
        return self.sp_model.encode(text, out_type=str)

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.sp_model.piece_to_id(token)

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        token = self.sp_model.IdToPiece(index)
        return token

    # Copied from transformers.models.albert.tokenization_albert.AlbertTokenizer.convert_tokens_to_string
    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        current_sub_tokens = []
        out_string = ""
        prev_is_special = False
        for token in tokens:
            # make sure that special tokens are not decoded using sentencepiece model
            if token in self.all_special_tokens:
                if not prev_is_special:
                    out_string += " "
                out_string += self.sp_model.decode(current_sub_tokens) + token
                prev_is_special = True
                current_sub_tokens = []
            else:
                current_sub_tokens.append(token)
                prev_is_special = False
        out_string += self.sp_model.decode(current_sub_tokens)
        return out_string.strip()

    def _decode(
        self,
        token_ids: List[int],
        skip_special_tokens: bool = False,
        clean_up_tokenization_spaces: bool = None,
        spaces_between_special_tokens: bool = True,
        **kwargs,
    ) -> str:
        """
        Decode the token IDs into a human-readable string.

        Args:
            self: The BigBirdTokenizer instance.
            token_ids (List[int]): A list of token IDs to be decoded into a string.
            skip_special_tokens (bool, optional): Whether to skip special tokens during decoding. Defaults to False.
            clean_up_tokenization_spaces (bool, optional): Whether to clean up tokenization spaces in the decoded text.
                Defaults to None.
            spaces_between_special_tokens (bool, optional):
                Whether to include spaces between special tokens in the decoded text. Defaults to True.

        Returns:
            str: The decoded string representation of the input token IDs.

        Raises:
            None.
            """
        self._decode_use_source_tokenizer = kwargs.pop("use_source_tokenizer", False)

        filtered_tokens = self.convert_ids_to_tokens(
            token_ids, skip_special_tokens=skip_special_tokens
        )

        # To avoid mixing byte-level and unicode for byte-level BPT
        # we need to build string separately for added tokens and byte-level tokens
        # cf. https://github.com/huggingface/transformers/issues/1133
        sub_texts = []
        current_sub_text = []
        for token in filtered_tokens:
            if skip_special_tokens and token in self.all_special_ids:
                continue
            if token in self.added_tokens_encoder:
                if current_sub_text:
                    sub_texts.append(self.convert_tokens_to_string(current_sub_text))
                    current_sub_text = []
                sub_texts.append(token)
            else:
                current_sub_text.append(token)
        if current_sub_text:
            sub_texts.append(self.convert_tokens_to_string(current_sub_text))

        # Mimic the behavior of the Rust tokenizer:
        # No space before [MASK] and [SEP]
        if spaces_between_special_tokens:
            text = re.sub(r" (\[(MASK|SEP)\])", r"\1", " ".join(sub_texts))
        else:
            text = "".join(sub_texts)

        clean_up_tokenization_spaces = (
            clean_up_tokenization_spaces
            if clean_up_tokenization_spaces is not None
            else self.clean_up_tokenization_spaces
        )
        if clean_up_tokenization_spaces:
            clean_text = self.clean_up_tokenization(text)
            return clean_text
        return text

    def save_vocabulary(
        self, save_directory: str, filename_prefix: Optional[str] = None
    ) -> Tuple[str]:
        '''
        Save the vocabulary to a specified directory with an optional filename prefix.

        Args:
            self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.
            save_directory (str): The directory where the vocabulary will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the filename of the vocabulary. Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the path to the saved vocabulary file.

        Raises:
            OSError: If the save_directory is not a valid directory.
            IOError: If the vocabulary file cannot be copied or written to the specified location.
        '''
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        out_vocab_file = os.path.join(
            save_directory,
            (filename_prefix + "-" if filename_prefix else "")
            + VOCAB_FILES_NAMES["vocab_file"],
        )

        if os.path.abspath(self.vocab_file) != os.path.abspath(
            out_vocab_file
        ) and os.path.isfile(self.vocab_file):
            copyfile(self.vocab_file, out_vocab_file)
        elif not os.path.isfile(self.vocab_file):
            with open(out_vocab_file, "wb") as fi:
                content_spiece_model = self.sp_model.serialized_model_proto()
                fi.write(content_spiece_model)

        return (out_vocab_file,)

    def build_inputs_with_special_tokens(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A Big Bird sequence has the following format:

        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] B [SEP]`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        if token_ids_1 is None:
            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
        cls = [self.cls_token_id]
        sep = [self.sep_token_id]
        return cls + token_ids_0 + sep + token_ids_1 + sep

    def get_special_tokens_mask(
        self,
        token_ids_0: List[int],
        token_ids_1: Optional[List[int]] = None,
        already_has_special_tokens: bool = False,
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0,
                token_ids_1=token_ids_1,
                already_has_special_tokens=True,
            )

        if token_ids_1 is None:
            return [1] + ([0] * len(token_ids_0)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]

    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence
        pair mask has the following format:
        ```0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |```

        If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.vocab_size` `property` ¶

Method to retrieve the vocabulary size of the BigBirdTokenizer.

PARAMETER	DESCRIPTION
`self`	The instance of the BigBirdTokenizer class. This parameter is required to access the tokenizer's properties. TYPE: `BigBirdTokenizer`

RETURNS	DESCRIPTION
`None`	The method returns the vocabulary size as an integer value.

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.getstate()` ¶

The 'getstate' method in the 'BigBirdTokenizer' class is used to retrieve the current state of the object for serialization. This method takes one parameter, 'self', which refers to the instance of the 'BigBirdTokenizer' class.

PARAMETER	DESCRIPTION
`self`	The instance of the 'BigBirdTokenizer' class. TYPE: `BigBirdTokenizer`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def __getstate__(self):
    """
    The '__getstate__' method in the 'BigBirdTokenizer' class is used to retrieve the current state of the object
    for serialization. This method takes one parameter, 'self', which refers to the instance of
    the 'BigBirdTokenizer' class.

    Args:
        self (BigBirdTokenizer): The instance of the 'BigBirdTokenizer' class.

    Returns:
        None.

    Raises:
        None.
    """
    state = self.__dict__.copy()
    state["sp_model"] = None
    return state

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.init(vocab_file, unk_token='<unk>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', sep_token='[SEP]', mask_token='[MASK]', cls_token='[CLS]', sp_model_kwargs=None, **kwargs)` ¶

Initializes an instance of the BigBirdTokenizer class.

PARAMETER	DESCRIPTION
`self`	The instance of the BigBirdTokenizer class.
`vocab_file`	Path to the vocabulary file. TYPE: `str`
`unk_token`	The token representing unknown words. Defaults to ''. TYPE: `str` DEFAULT: `'<unk>'`
`bos_token`	The token representing the beginning of a sentence. Defaults to ''. TYPE: `str` DEFAULT: `'<s>'`
`eos_token`	The token representing the end of a sentence. Defaults to ''. TYPE: `str` DEFAULT: `'</s>'`
`pad_token`	The token representing padding. Defaults to ''. TYPE: `str` DEFAULT: `'<pad>'`
`sep_token`	The token representing sentence separation. Defaults to '[SEP]'. TYPE: `str` DEFAULT: `'[SEP]'`
`mask_token`	The token representing masked words. Defaults to '[MASK]'. TYPE: `str` DEFAULT: `'[MASK]'`
`cls_token`	The token representing classification. Defaults to '[CLS]'. TYPE: `str` DEFAULT: `'[CLS]'`
`sp_model_kwargs`	Additional arguments for the SentencePieceProcessor. Defaults to None. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional keyword arguments. DEFAULT: `{}`

RETURNS	DESCRIPTION
`None`	None.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def __init__(
    self,
    vocab_file,
    unk_token="<unk>",
    bos_token="<s>",
    eos_token="</s>",
    pad_token="<pad>",
    sep_token="[SEP]",
    mask_token="[MASK]",
    cls_token="[CLS]",
    sp_model_kwargs: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> None:
    """
    Initializes an instance of the BigBirdTokenizer class.

    Args:
        self: The instance of the BigBirdTokenizer class.
        vocab_file (str): Path to the vocabulary file.
        unk_token (str, optional): The token representing unknown words. Defaults to '<unk>'.
        bos_token (str, optional): The token representing the beginning of a sentence. Defaults to '<s>'.
        eos_token (str, optional): The token representing the end of a sentence. Defaults to '</s>'.
        pad_token (str, optional): The token representing padding. Defaults to '<pad>'.
        sep_token (str, optional): The token representing sentence separation. Defaults to '[SEP]'.
        mask_token (str, optional): The token representing masked words. Defaults to '[MASK]'.
        cls_token (str, optional): The token representing classification. Defaults to '[CLS]'.
        sp_model_kwargs (Optional[Dict[str, Any]], optional): Additional arguments for the SentencePieceProcessor. Defaults to None.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        None.
    """
    bos_token = (
        AddedToken(bos_token, lstrip=False, rstrip=False)
        if isinstance(bos_token, str)
        else bos_token
    )
    eos_token = (
        AddedToken(eos_token, lstrip=False, rstrip=False)
        if isinstance(eos_token, str)
        else eos_token
    )
    unk_token = (
        AddedToken(unk_token, lstrip=False, rstrip=False)
        if isinstance(unk_token, str)
        else unk_token
    )
    pad_token = (
        AddedToken(pad_token, lstrip=False, rstrip=False)
        if isinstance(pad_token, str)
        else pad_token
    )
    cls_token = (
        AddedToken(cls_token, lstrip=False, rstrip=False)
        if isinstance(cls_token, str)
        else cls_token
    )
    sep_token = (
        AddedToken(sep_token, lstrip=False, rstrip=False)
        if isinstance(sep_token, str)
        else sep_token
    )

    # Mask token behave like a normal word, i.e. include the space before it
    mask_token = (
        AddedToken(mask_token, lstrip=True, rstrip=False)
        if isinstance(mask_token, str)
        else mask_token
    )

    self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs

    self.vocab_file = vocab_file

    self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
    self.sp_model.Load(vocab_file)

    super().__init__(
        bos_token=bos_token,
        eos_token=eos_token,
        unk_token=unk_token,
        pad_token=pad_token,
        sep_token=sep_token,
        mask_token=mask_token,
        cls_token=cls_token,
        sp_model_kwargs=self.sp_model_kwargs,
        **kwargs,
    )

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.setstate(d)` ¶

Sets the state of the BigBirdTokenizer object based on the provided dictionary.

PARAMETER	DESCRIPTION
`self`	The instance of the BigBirdTokenizer class. TYPE: `BigBirdTokenizer`
`d`	The dictionary containing the state information. TYPE: `dict`

RETURNS	DESCRIPTION
	None

This method sets the state of the BigBirdTokenizer object by assigning the dictionary 'd' to the 'dict' attribute of the instance. If the instance does not have the 'sp_model_kwargs' attribute, it is initialized as an empty dictionary. The SentencePieceProcessor object 'sp_model' is then created and assigned to the 'sp_model' attribute of the instance. The 'sp_model_kwargs' dictionary is used to pass any additional keyword arguments to the SentencePieceProcessor initialization. Finally, the vocabulary file is loaded using the 'Load' method of the 'sp_model' object.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def __setstate__(self, d):
    """
    Sets the state of the BigBirdTokenizer object based on the provided dictionary.

    Args:
        self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.
        d (dict): The dictionary containing the state information.

    Returns:
        None

    Raises:
        None

    This method sets the state of the BigBirdTokenizer object by assigning the dictionary 'd' to the '__dict__' attribute of the instance.
    If the instance does not have the 'sp_model_kwargs' attribute, it is initialized as an empty dictionary.
    The SentencePieceProcessor object 'sp_model' is then created and assigned to the 'sp_model' attribute of the instance.
    The 'sp_model_kwargs' dictionary is used to pass any additional keyword arguments to the SentencePieceProcessor initialization.
    Finally, the vocabulary file is loaded using the 'Load' method of the 'sp_model' object.
    """
    self.__dict__ = d

    # for backward compatibility
    if not hasattr(self, "sp_model_kwargs"):
        self.sp_model_kwargs = {}

    self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
    self.sp_model.Load(self.vocab_file)

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)` ¶

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A Big Bird sequence has the following format:

single sequence: [CLS] X [SEP]
pair of sequences: [CLS] A [SEP] B [SEP]

PARAMETER	DESCRIPTION
`token_ids_0`	List of IDs to which the special tokens will be added. TYPE: `List[int]`
`token_ids_1`	Optional second list of IDs for sequence pairs. TYPE: `List[int]`, optional DEFAULT: `None`

RETURNS	DESCRIPTION
`List[int]`	`List[int]`: List of input IDs with the appropriate special tokens.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def build_inputs_with_special_tokens(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. A Big Bird sequence has the following format:

    - single sequence: `[CLS] X [SEP]`
    - pair of sequences: `[CLS] A [SEP] B [SEP]`

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    if token_ids_1 is None:
        return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
    cls = [self.cls_token_id]
    sep = [self.sep_token_id]
    return cls + token_ids_0 + sep + token_ids_1 + sep

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.convert_tokens_to_string(tokens)` ¶

Converts a sequence of tokens (string) in a single string.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def convert_tokens_to_string(self, tokens):
    """Converts a sequence of tokens (string) in a single string."""
    current_sub_tokens = []
    out_string = ""
    prev_is_special = False
    for token in tokens:
        # make sure that special tokens are not decoded using sentencepiece model
        if token in self.all_special_tokens:
            if not prev_is_special:
                out_string += " "
            out_string += self.sp_model.decode(current_sub_tokens) + token
            prev_is_special = True
            current_sub_tokens = []
        else:
            current_sub_tokens.append(token)
            prev_is_special = False
    out_string += self.sp_model.decode(current_sub_tokens)
    return out_string.strip()

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)` ¶

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence pair mask has the following format: 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

Args: token_ids_0 (List[int]): List of IDs. token_ids_1 (List[int], optional): Optional second list of IDs for sequence pairs.

Returns: List[int]: List of token type IDs according to the given sequence(s).

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def create_token_type_ids_from_sequences(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence
    pair mask has the following format:
    ```0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |```

    If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

    Args:
        token_ids_0 (`List[int]`):
            List of IDs.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
    """
    sep = [self.sep_token_id]
    cls = [self.cls_token_id]
    if token_ids_1 is None:
        return len(cls + token_ids_0 + sep) * [0]
    return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)` ¶

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

PARAMETER	DESCRIPTION
`token_ids_0`	List of IDs. TYPE: `List[int]`
`token_ids_1`	Optional second list of IDs for sequence pairs. TYPE: `List[int]`, optional DEFAULT: `None`
`already_has_special_tokens`	Whether or not the token list is already formatted with special tokens for the model. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`

RETURNS	DESCRIPTION
`List[int]`	`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def get_special_tokens_mask(
    self,
    token_ids_0: List[int],
    token_ids_1: Optional[List[int]] = None,
    already_has_special_tokens: bool = False,
) -> List[int]:
    """
    Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
    special tokens using the tokenizer `prepare_for_model` method.

    Args:
        token_ids_0 (`List[int]`):
            List of IDs.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.
        already_has_special_tokens (`bool`, *optional*, defaults to `False`):
            Whether or not the token list is already formatted with special tokens for the model.

    Returns:
        `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
    """
    if already_has_special_tokens:
        return super().get_special_tokens_mask(
            token_ids_0=token_ids_0,
            token_ids_1=token_ids_1,
            already_has_special_tokens=True,
        )

    if token_ids_1 is None:
        return [1] + ([0] * len(token_ids_0)) + [1]
    return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.get_vocab()` ¶

This method returns the vocabulary for the BigBirdTokenizer.

PARAMETER	DESCRIPTION
`self`	The instance of the BigBirdTokenizer class. TYPE: `BigBirdTokenizer`

RETURNS	DESCRIPTION
`dict`	A dictionary containing the vocabulary, where keys are tokens and values are their corresponding ids.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def get_vocab(self):
    """
    This method returns the vocabulary for the BigBirdTokenizer.

    Args:
        self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.

    Returns:
        dict: A dictionary containing the vocabulary, where keys are tokens and values are their corresponding ids.

    Raises:
        None
    """
    vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
    vocab.update(self.added_tokens_encoder)
    return vocab

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.save_vocabulary(save_directory, filename_prefix=None)` ¶

Save the vocabulary to a specified directory with an optional filename prefix.

PARAMETER	DESCRIPTION
`self`	The instance of the BigBirdTokenizer class. TYPE: `BigBirdTokenizer`
`save_directory`	The directory where the vocabulary will be saved. TYPE: `str`
`filename_prefix`	An optional prefix to be added to the filename of the vocabulary. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tuple[str]`	Tuple[str]: A tuple containing the path to the saved vocabulary file.

RAISES	DESCRIPTION
`OSError`	If the save_directory is not a valid directory.
`IOError`	If the vocabulary file cannot be copied or written to the specified location.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird.py

def save_vocabulary(
    self, save_directory: str, filename_prefix: Optional[str] = None
) -> Tuple[str]:
    '''
    Save the vocabulary to a specified directory with an optional filename prefix.

    Args:
        self (BigBirdTokenizer): The instance of the BigBirdTokenizer class.
        save_directory (str): The directory where the vocabulary will be saved.
        filename_prefix (Optional[str]): An optional prefix to be added to the filename of the vocabulary. Defaults to None.

    Returns:
        Tuple[str]: A tuple containing the path to the saved vocabulary file.

    Raises:
        OSError: If the save_directory is not a valid directory.
        IOError: If the vocabulary file cannot be copied or written to the specified location.
    '''
    if not os.path.isdir(save_directory):
        logger.error(f"Vocabulary path ({save_directory}) should be a directory")
        return
    out_vocab_file = os.path.join(
        save_directory,
        (filename_prefix + "-" if filename_prefix else "")
        + VOCAB_FILES_NAMES["vocab_file"],
    )

    if os.path.abspath(self.vocab_file) != os.path.abspath(
        out_vocab_file
    ) and os.path.isfile(self.vocab_file):
        copyfile(self.vocab_file, out_vocab_file)
    elif not os.path.isfile(self.vocab_file):
        with open(out_vocab_file, "wb") as fi:
            content_spiece_model = self.sp_model.serialized_model_proto()
            fi.write(content_spiece_model)

    return (out_vocab_file,)

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast` ¶

Bases: PreTrainedTokenizerFast

Construct a "fast" BigBird tokenizer (backed by HuggingFace's tokenizers library). Based on Unigram. This tokenizer inherits from [PreTrainedTokenizerFast] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods

PARAMETER	DESCRIPTION
`vocab_file`	SentencePiece file (generally has a .spm extension) that contains the vocabulary necessary to instantiate a tokenizer. TYPE: `str` DEFAULT: `None`
`bos_token`	The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the `cls_token`. TYPE: `str`, optional, defaults to `"<s>"` DEFAULT: `'<s>'`
`eos_token`	The end of sequence token. .. note:: When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the `sep_token`. TYPE: `str`, optional, defaults to `"</s>"` DEFAULT: `'</s>'`
`unk_token`	The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. TYPE: `str`, optional, defaults to `"<unk>"` DEFAULT: `'<unk>'`
`sep_token`	The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. TYPE: `str`, optional, defaults to `"[SEP]"` DEFAULT: `'[SEP]'`
`pad_token`	The token used for padding, for example when batching sequences of different lengths. TYPE: `str`, optional, defaults to `"<pad>"` DEFAULT: `'<pad>'`
`cls_token`	The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. TYPE: `str`, optional, defaults to `"[CLS]"` DEFAULT: `'[CLS]'`
`mask_token`	The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. TYPE: `str`, optional, defaults to `"[MASK]"` DEFAULT: `'[MASK]'`

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird_fast.py

class BigBirdTokenizerFast(PreTrainedTokenizerFast):
    """
    Construct a "fast" BigBird tokenizer (backed by HuggingFace's *tokenizers* library). Based on
    [Unigram](https://hf-mirror.com/docs/tokenizers/python/latest/components.html?highlight=unigram#models). This
    tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods

    Args:
        vocab_file (`str`):
            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the beginning of
            sequence. The token used is the `cls_token`.

            </Tip>

        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token. .. note:: When building a sequence using special tokens, this is not the token
            that is used for the end of sequence. The token used is the `sep_token`.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        sep_token (`str`, *optional*, defaults to `"[SEP]"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (`str`, *optional*, defaults to `"[CLS]"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (`str`, *optional*, defaults to `"[MASK]"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    slow_tokenizer_class = BigBirdTokenizer
    model_input_names = ["input_ids", "attention_mask"]
    prefix_tokens: List[int] = []

    def __init__(
        self,
        vocab_file=None,
        tokenizer_file=None,
        unk_token="<unk>",
        bos_token="<s>",
        eos_token="</s>",
        pad_token="<pad>",
        sep_token="[SEP]",
        mask_token="[MASK]",
        cls_token="[CLS]",
        **kwargs,
    ):
        """
        Initializes a new instance of the BigBirdTokenizerFast class.

        Args:
            self: The current instance of the class.
            vocab_file (str): The path to the vocabulary file. If None, the tokenizer will not have a vocabulary.
            tokenizer_file (str): The path to the tokenizer file. If None, the tokenizer will not have a tokenizer.
            unk_token (str): The unknown token to be used for out-of-vocabulary words. Default is '<unk>'.
            bos_token (str or AddedToken): The beginning of sentence token. Default is '<s>'.
            eos_token (str or AddedToken): The end of sentence token. Default is '</s>'.
            pad_token (str or AddedToken): The padding token. Default is '<pad>'.
            sep_token (str or AddedToken): The separator token. Default is '[SEP]'.
            mask_token (str or AddedToken): The mask token to be used during tokenization. Default is '[MASK]'.
            cls_token (str or AddedToken): The classification token. Default is '[CLS]'.

        Returns:
            None

        Raises:
            None
        """
        bos_token = (
            AddedToken(bos_token, lstrip=False, rstrip=False)
            if isinstance(bos_token, str)
            else bos_token
        )
        eos_token = (
            AddedToken(eos_token, lstrip=False, rstrip=False)
            if isinstance(eos_token, str)
            else eos_token
        )
        unk_token = (
            AddedToken(unk_token, lstrip=False, rstrip=False)
            if isinstance(unk_token, str)
            else unk_token
        )
        pad_token = (
            AddedToken(pad_token, lstrip=False, rstrip=False)
            if isinstance(pad_token, str)
            else pad_token
        )
        cls_token = (
            AddedToken(cls_token, lstrip=False, rstrip=False)
            if isinstance(cls_token, str)
            else cls_token
        )
        sep_token = (
            AddedToken(sep_token, lstrip=False, rstrip=False)
            if isinstance(sep_token, str)
            else sep_token
        )

        # Mask token behave like a normal word, i.e. include the space before it
        mask_token = (
            AddedToken(mask_token, lstrip=True, rstrip=False)
            if isinstance(mask_token, str)
            else mask_token
        )

        super().__init__(
            vocab_file,
            tokenizer_file=tokenizer_file,
            bos_token=bos_token,
            eos_token=eos_token,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            **kwargs,
        )

        self.vocab_file = vocab_file

    @property
    def can_save_slow_tokenizer(self) -> bool:
        """
        Check if the slow tokenizer can be saved.

        Args:
            self (BigBirdTokenizerFast): The instance of the BigBirdTokenizerFast class.
                Represents the tokenizer object for which the check is being performed.

        Returns:
            bool: Returns True if the vocab file exists for the tokenizer, otherwise False.

        Raises:
            None
        """
        return os.path.isfile(self.vocab_file) if self.vocab_file else False

    def build_inputs_with_special_tokens(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An BigBird sequence has the following format:

        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] B [SEP]`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: list of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return cls + token_ids_0 + sep
        return cls + token_ids_0 + sep + token_ids_1 + sep

    def get_special_tokens_mask(
        self,
        token_ids_0: List[int],
        token_ids_1: Optional[List[int]] = None,
        already_has_special_tokens: bool = False,
    ) -> List[int]:
        """
        Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`):
                List of ids.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Set to True if the token list is already formatted with special tokens for the model

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            if token_ids_1 is not None:
                raise ValueError(
                    "You should not supply a second sequence if the provided sequence of "
                    "ids is already formatted with special tokens for the model."
                )
            return [
                1 if x in [self.sep_token_id, self.cls_token_id] else 0
                for x in token_ids_0
            ]

        if token_ids_1 is None:
            return [1] + ([0] * len(token_ids_0)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]

    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT
        sequence pair mask has the following format:

        ```
        0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
        | first sequence    | second sequence |
        ```

        if token_ids_1 is None, only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (`List[int]`):
                List of ids.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]

        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

    def save_vocabulary(
        self, save_directory: str, filename_prefix: Optional[str] = None
    ) -> Tuple[str]:
        """
        Saves the vocabulary for a slow tokenizer.

        Args:
            self (BigBirdTokenizerFast): An instance of the BigBirdTokenizerFast class.
            save_directory (str): The directory where the vocabulary will be saved.
            filename_prefix (Optional[str], optional): A prefix to be added to the filename of the saved vocabulary.
                Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the path to the saved vocabulary file.

        Raises:
            ValueError: If the fast tokenizer does not have the necessary information to save the vocabulary for a slow tokenizer.
            FileNotFoundError: If the specified save_directory does not exist.
        """
        if not self.can_save_slow_tokenizer:
            raise ValueError(
                "Your fast tokenizer does not have the necessary information to save the vocabulary for a slow "
                "tokenizer."
            )

        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        out_vocab_file = os.path.join(
            save_directory,
            (filename_prefix + "-" if filename_prefix else "")
            + VOCAB_FILES_NAMES["vocab_file"],
        )

        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
            copyfile(self.vocab_file, out_vocab_file)

        return (out_vocab_file,)

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.can_save_slow_tokenizer: bool` `property` ¶

Check if the slow tokenizer can be saved.

PARAMETER	DESCRIPTION
`self`	The instance of the BigBirdTokenizerFast class. Represents the tokenizer object for which the check is being performed. TYPE: `BigBirdTokenizerFast`

RETURNS	DESCRIPTION
`bool`	Returns True if the vocab file exists for the tokenizer, otherwise False. TYPE: `bool`

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.init(vocab_file=None, tokenizer_file=None, unk_token='<unk>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', sep_token='[SEP]', mask_token='[MASK]', cls_token='[CLS]', **kwargs)` ¶

Initializes a new instance of the BigBirdTokenizerFast class.

PARAMETER	DESCRIPTION
`self`	The current instance of the class.
`vocab_file`	The path to the vocabulary file. If None, the tokenizer will not have a vocabulary. TYPE: `str` DEFAULT: `None`
`tokenizer_file`	The path to the tokenizer file. If None, the tokenizer will not have a tokenizer. TYPE: `str` DEFAULT: `None`
`unk_token`	The unknown token to be used for out-of-vocabulary words. Default is ''. TYPE: `str` DEFAULT: `'<unk>'`
`bos_token`	The beginning of sentence token. Default is ''. TYPE: `str or AddedToken` DEFAULT: `'<s>'`
`eos_token`	The end of sentence token. Default is ''. TYPE: `str or AddedToken` DEFAULT: `'</s>'`
`pad_token`	The padding token. Default is ''. TYPE: `str or AddedToken` DEFAULT: `'<pad>'`
`sep_token`	The separator token. Default is '[SEP]'. TYPE: `str or AddedToken` DEFAULT: `'[SEP]'`
`mask_token`	The mask token to be used during tokenization. Default is '[MASK]'. TYPE: `str or AddedToken` DEFAULT: `'[MASK]'`
`cls_token`	The classification token. Default is '[CLS]'. TYPE: `str or AddedToken` DEFAULT: `'[CLS]'`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird_fast.py

def __init__(
    self,
    vocab_file=None,
    tokenizer_file=None,
    unk_token="<unk>",
    bos_token="<s>",
    eos_token="</s>",
    pad_token="<pad>",
    sep_token="[SEP]",
    mask_token="[MASK]",
    cls_token="[CLS]",
    **kwargs,
):
    """
    Initializes a new instance of the BigBirdTokenizerFast class.

    Args:
        self: The current instance of the class.
        vocab_file (str): The path to the vocabulary file. If None, the tokenizer will not have a vocabulary.
        tokenizer_file (str): The path to the tokenizer file. If None, the tokenizer will not have a tokenizer.
        unk_token (str): The unknown token to be used for out-of-vocabulary words. Default is '<unk>'.
        bos_token (str or AddedToken): The beginning of sentence token. Default is '<s>'.
        eos_token (str or AddedToken): The end of sentence token. Default is '</s>'.
        pad_token (str or AddedToken): The padding token. Default is '<pad>'.
        sep_token (str or AddedToken): The separator token. Default is '[SEP]'.
        mask_token (str or AddedToken): The mask token to be used during tokenization. Default is '[MASK]'.
        cls_token (str or AddedToken): The classification token. Default is '[CLS]'.

    Returns:
        None

    Raises:
        None
    """
    bos_token = (
        AddedToken(bos_token, lstrip=False, rstrip=False)
        if isinstance(bos_token, str)
        else bos_token
    )
    eos_token = (
        AddedToken(eos_token, lstrip=False, rstrip=False)
        if isinstance(eos_token, str)
        else eos_token
    )
    unk_token = (
        AddedToken(unk_token, lstrip=False, rstrip=False)
        if isinstance(unk_token, str)
        else unk_token
    )
    pad_token = (
        AddedToken(pad_token, lstrip=False, rstrip=False)
        if isinstance(pad_token, str)
        else pad_token
    )
    cls_token = (
        AddedToken(cls_token, lstrip=False, rstrip=False)
        if isinstance(cls_token, str)
        else cls_token
    )
    sep_token = (
        AddedToken(sep_token, lstrip=False, rstrip=False)
        if isinstance(sep_token, str)
        else sep_token
    )

    # Mask token behave like a normal word, i.e. include the space before it
    mask_token = (
        AddedToken(mask_token, lstrip=True, rstrip=False)
        if isinstance(mask_token, str)
        else mask_token
    )

    super().__init__(
        vocab_file,
        tokenizer_file=tokenizer_file,
        bos_token=bos_token,
        eos_token=eos_token,
        unk_token=unk_token,
        sep_token=sep_token,
        pad_token=pad_token,
        cls_token=cls_token,
        mask_token=mask_token,
        **kwargs,
    )

    self.vocab_file = vocab_file

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)` ¶

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. An BigBird sequence has the following format:

single sequence: [CLS] X [SEP]
pair of sequences: [CLS] A [SEP] B [SEP]

PARAMETER	DESCRIPTION
`token_ids_0`	List of IDs to which the special tokens will be added TYPE: `List[int]`
`token_ids_1`	Optional second list of IDs for sequence pairs. TYPE: `List[int]`, optional DEFAULT: `None`

RETURNS	DESCRIPTION
`List[int]`	`List[int]`: list of input IDs with the appropriate special tokens.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird_fast.py

def build_inputs_with_special_tokens(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. An BigBird sequence has the following format:

    - single sequence: `[CLS] X [SEP]`
    - pair of sequences: `[CLS] A [SEP] B [SEP]`

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: list of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    sep = [self.sep_token_id]
    cls = [self.cls_token_id]
    if token_ids_1 is None:
        return cls + token_ids_0 + sep
    return cls + token_ids_0 + sep + token_ids_1 + sep

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)` ¶

Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT sequence pair mask has the following format:

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |

if token_ids_1 is None, only returns the first portion of the mask (0s).

PARAMETER	DESCRIPTION
`token_ids_0`	List of ids. TYPE: `List[int]`
`token_ids_1`	Optional second list of IDs for sequence pairs. TYPE: `List[int]`, optional DEFAULT: `None`

RETURNS	DESCRIPTION
`List[int]`	`List[int]`: List of token type IDs according to the given sequence(s).

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird_fast.py

def create_token_type_ids_from_sequences(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT
    sequence pair mask has the following format:

    ```
    0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
    | first sequence    | second sequence |
    ```

    if token_ids_1 is None, only returns the first portion of the mask (0s).

    Args:
        token_ids_0 (`List[int]`):
            List of ids.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
    """
    sep = [self.sep_token_id]
    cls = [self.cls_token_id]

    if token_ids_1 is None:
        return len(cls + token_ids_0 + sep) * [0]
    return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)` ¶

Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

PARAMETER	DESCRIPTION
`token_ids_0`	List of ids. TYPE: `List[int]`
`token_ids_1`	Optional second list of IDs for sequence pairs. TYPE: `List[int]`, optional DEFAULT: `None`
`already_has_special_tokens`	Set to True if the token list is already formatted with special tokens for the model TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`

RETURNS	DESCRIPTION
`List[int]`	`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird_fast.py

def get_special_tokens_mask(
    self,
    token_ids_0: List[int],
    token_ids_1: Optional[List[int]] = None,
    already_has_special_tokens: bool = False,
) -> List[int]:
    """
    Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding
    special tokens using the tokenizer `prepare_for_model` method.

    Args:
        token_ids_0 (`List[int]`):
            List of ids.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.
        already_has_special_tokens (`bool`, *optional*, defaults to `False`):
            Set to True if the token list is already formatted with special tokens for the model

    Returns:
        `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
    """
    if already_has_special_tokens:
        if token_ids_1 is not None:
            raise ValueError(
                "You should not supply a second sequence if the provided sequence of "
                "ids is already formatted with special tokens for the model."
            )
        return [
            1 if x in [self.sep_token_id, self.cls_token_id] else 0
            for x in token_ids_0
        ]

    if token_ids_1 is None:
        return [1] + ([0] * len(token_ids_0)) + [1]
    return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.save_vocabulary(save_directory, filename_prefix=None)` ¶

Saves the vocabulary for a slow tokenizer.

PARAMETER	DESCRIPTION
`self`	An instance of the BigBirdTokenizerFast class. TYPE: `BigBirdTokenizerFast`
`save_directory`	The directory where the vocabulary will be saved. TYPE: `str`
`filename_prefix`	A prefix to be added to the filename of the saved vocabulary. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tuple[str]`	Tuple[str]: A tuple containing the path to the saved vocabulary file.

RAISES	DESCRIPTION
`ValueError`	If the fast tokenizer does not have the necessary information to save the vocabulary for a slow tokenizer.
`FileNotFoundError`	If the specified save_directory does not exist.

Source code in mindnlp\transformers\models\big_bird\tokenization_big_bird_fast.py

def save_vocabulary(
    self, save_directory: str, filename_prefix: Optional[str] = None
) -> Tuple[str]:
    """
    Saves the vocabulary for a slow tokenizer.

    Args:
        self (BigBirdTokenizerFast): An instance of the BigBirdTokenizerFast class.
        save_directory (str): The directory where the vocabulary will be saved.
        filename_prefix (Optional[str], optional): A prefix to be added to the filename of the saved vocabulary.
            Defaults to None.

    Returns:
        Tuple[str]: A tuple containing the path to the saved vocabulary file.

    Raises:
        ValueError: If the fast tokenizer does not have the necessary information to save the vocabulary for a slow tokenizer.
        FileNotFoundError: If the specified save_directory does not exist.
    """
    if not self.can_save_slow_tokenizer:
        raise ValueError(
            "Your fast tokenizer does not have the necessary information to save the vocabulary for a slow "
            "tokenizer."
        )

    if not os.path.isdir(save_directory):
        logger.error(f"Vocabulary path ({save_directory}) should be a directory")
        return
    out_vocab_file = os.path.join(
        save_directory,
        (filename_prefix + "-" if filename_prefix else "")
        + VOCAB_FILES_NAMES["vocab_file"],
    )

    if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
        copyfile(self.vocab_file, out_vocab_file)

    return (out_vocab_file,)

big_bird

mindnlp.transformers.models.big_bird.configuration_big_bird.BigBirdConfig ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForCausalLM ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMaskedLM ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMultipleChoice ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMultipleChoice.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None) ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForPreTraining ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForQuestionAnswering ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None) ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForTokenClassification ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForTokenClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None) ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdLayer ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdModel ¶

mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdPreTrainedModel ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.vocab_size property ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.__getstate__() ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.__init__(vocab_file, unk_token='<unk>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', sep_token='[SEP]', mask_token='[MASK]', cls_token='[CLS]', sp_model_kwargs=None, **kwargs) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.__setstate__(d) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.convert_tokens_to_string(tokens) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.get_vocab() ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.save_vocabulary(save_directory, filename_prefix=None) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.can_save_slow_tokenizer: bool property ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.__init__(vocab_file=None, tokenizer_file=None, unk_token='<unk>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', sep_token='[SEP]', mask_token='[MASK]', cls_token='[CLS]', **kwargs) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False) ¶

mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.save_vocabulary(save_directory, filename_prefix=None) ¶

`mindnlp.transformers.models.big_bird.configuration_big_bird.BigBirdConfig` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForCausalLM` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMaskedLM` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMultipleChoice` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForMultipleChoice.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForPreTraining` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForQuestionAnswering` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForTokenClassification` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdForTokenClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdLayer` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdModel` ¶

`mindnlp.transformers.models.big_bird.modeling_big_bird.BigBirdPreTrainedModel` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.vocab_size` `property` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.getstate()` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.init(vocab_file, unk_token='<unk>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', sep_token='[SEP]', mask_token='[MASK]', cls_token='[CLS]', sp_model_kwargs=None, **kwargs)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.setstate(d)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.convert_tokens_to_string(tokens)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.get_vocab()` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird.BigBirdTokenizer.save_vocabulary(save_directory, filename_prefix=None)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.can_save_slow_tokenizer: bool` `property` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.init(vocab_file=None, tokenizer_file=None, unk_token='<unk>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', sep_token='[SEP]', mask_token='[MASK]', cls_token='[CLS]', **kwargs)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)` ¶

`mindnlp.transformers.models.big_bird.tokenization_big_bird_fast.BigBirdTokenizerFast.save_vocabulary(save_directory, filename_prefix=None)` ¶