cpmbee

`mindnlp.transformers.models.cpmbee.configuration_cpmbee` ¶

CpmBee model configuration

`mindnlp.transformers.models.cpmbee.configuration_cpmbee.CpmBeeConfig` ¶

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [CpmBeeModel]. It is used to instbeeiate an CPMBee model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the CPMBee openbmb/cpm-bee-10b architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER	DESCRIPTION
`vocab_size`	Vocabulary size of the CPMBee model. Defines the number of different tokens that can be represented by the `input` passed when calling [`CpmBeeModel`]. TYPE: `int`, optional, defaults to 30720 DEFAULT: `30720`
`hidden_size`	Dimension of the encoder layers. TYPE: `int`, optional, defaults to 4096 DEFAULT: `4096`
`num_attention_heads`	Number of attention heads in the Transformer encoder. TYPE: `int`, optional, defaults to 32 DEFAULT: `64`
`dim_head`	Dimension of attention heads for each attention layer in the Transformer encoder. TYPE: `int`, optional, defaults to 128 DEFAULT: `64`
`dim_ff`	Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. TYPE: `int`, optional, defaults to 10240 DEFAULT: `10240`
`num_hidden_layers`	Number of layers of the Transformer encoder. TYPE: `int`, optional, defaults to 48 DEFAULT: `32`
`dropout_p`	The dropout probabilitiy for all fully connected layers in the embeddings, encoder. TYPE: `float`, optional, defaults to 0.1 DEFAULT: `0.0`
`position_bias_num_buckets`	The number of position_bias buckets. TYPE: `int`, optional, defaults to 512 DEFAULT: `256`
`position_bias_num_segment_buckets`	The number of segment buckets. TYPE: `int`, optional, defaults to 32 DEFAULT: `32`
`position_bias_max_distance`	The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048). TYPE: `int`, optional, defaults to 2048 DEFAULT: `2048`
`eps`	The epsilon used by the layer normalization layers. TYPE: `float`, optional, defaults to 1e-6 DEFAULT: `1e-06`
`init_std`	Initialize parameters with std = init_std. TYPE: `float`, optional, defaults to 1.0 DEFAULT: `1.0`
`use_cache`	Whether to use cache. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`distance_scale`	Scale the rotary embedding. TYPE: `float` or `int`, optional, defaults to 16 DEFAULT: `16`
`mask_modules`	Decides which feedforward block or attention block is pruned. TYPE: `list` or `tuple`, optional, defaults to None DEFAULT: `None`
`half`	Decides the model parameters are half-precision or not. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`

Example

>>> from transformers import CpmBeeModel, CpmBeeConfig
...
>>> # Initializing a CPMBee cpm-bee-10b style configuration
>>> configuration = CpmBeeConfig()
...
>>> # Initializing a model from the cpm-bee-10b style configuration
>>> model = CpmBeeModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config

Source code in mindnlp\transformers\models\cpmbee\configuration_cpmbee.py

class CpmBeeConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`CpmBeeModel`]. It is used to instbeeiate an
    CPMBee model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the CPMBee
    [openbmb/cpm-bee-10b](https://hf-mirror.com/openbmb/cpm-bee-10b) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 30720):
            Vocabulary size of the CPMBee model. Defines the number of different tokens that can be represented by the
            `input` passed when calling [`CpmBeeModel`].
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimension of the encoder layers.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads in the Transformer encoder.
        dim_head (`int`, *optional*, defaults to 128):
            Dimension of attention heads for each attention layer in the Transformer encoder.
        dim_ff (`int`, *optional*, defaults to 10240):
            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
        num_hidden_layers (`int`, *optional*, defaults to 48):
            Number of layers of the Transformer encoder.
        dropout_p (`float`, *optional*, defaults to 0.1):
            The dropout probabilitiy for all fully connected layers in the embeddings, encoder.
        position_bias_num_buckets (`int`, *optional*, defaults to 512):
            The number of position_bias buckets.
        position_bias_num_segment_buckets (`int`, *optional*, defaults to 32):
            The number of segment buckets.
        position_bias_max_distance (`int`, *optional*, defaults to 2048):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        eps (`float`, *optional*, defaults to 1e-6):
            The epsilon used by the layer normalization layers.
        init_std (`float`, *optional*, defaults to 1.0):
            Initialize parameters with std = init_std.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether to use cache.
        distance_scale (`float` or `int`, *optional*, defaults to 16):
            Scale the rotary embedding.
        mask_modules (`list` or `tuple`, *optional*, defaults to None):
            Decides which feedforward block or attention block is pruned.
        half (`bool`, *optional*, defaults to `False`):
            Decides the model parameters are half-precision or not.

    Example:
        ```python
        >>> from transformers import CpmBeeModel, CpmBeeConfig
        ...
        >>> # Initializing a CPMBee cpm-bee-10b style configuration
        >>> configuration = CpmBeeConfig()
        ...
        >>> # Initializing a model from the cpm-bee-10b style configuration
        >>> model = CpmBeeModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "cpmbee"

    def __init__(
        self,
        vocab_size: int = 30720,
        hidden_size: int = 4096,
        num_attention_heads: int = 64,
        dim_head: int = 64,
        dim_ff: int = 10240,
        num_hidden_layers: int = 32,
        dropout_p: int = 0.0,
        position_bias_num_buckets: int = 256,
        position_bias_num_segment_buckets: int = 32,
        position_bias_max_distance: int = 2048,
        eps: int = 1e-6,
        init_std: float = 1.0,
        use_cache: bool = True,
        distance_scale: Union[int, float] = 16,
        mask_modules: Optional[Union[List, Tuple]] = None,
        half: bool = False,
        **kwargs,
    ):
        """
        __init__

        Initializes a CpmBeeConfig instance.

        Args:
            vocab_size (int): The size of the vocabulary. Defaults to 30720.
            hidden_size (int): The size of the hidden layers. Defaults to 4096.
            num_attention_heads (int): The number of attention heads. Defaults to 64.
            dim_head (int): The dimension of each attention head. Defaults to 64.
            dim_ff (int): The dimension of the feed forward network. Defaults to 10240.
            num_hidden_layers (int): The number of hidden layers. Defaults to 32.
            dropout_p (int): The dropout probability. Defaults to 0.0.
            position_bias_num_buckets (int): The number of buckets for position bias. Defaults to 256.
            position_bias_num_segment_buckets (int): The number of segment buckets for position bias. Defaults to 32.
            position_bias_max_distance (int): The maximum distance for position bias. Defaults to 2048.
            eps (int): A small value to avoid division by zero. Defaults to 1e-06.
            init_std (float): The standard deviation for weight initialization. Defaults to 1.0.
            use_cache (bool): Flag to indicate whether to use cache. Defaults to True.
            distance_scale (Union[int, float]): The scale factor for distance. Defaults to 16.
            mask_modules (Optional[Union[List, Tuple]]): List or Tuple of modules to be masked. Defaults to None.
            half (bool): Flag to indicate whether to use half precision. Defaults to False.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(**kwargs)
        self.position_bias_num_segment_buckets = position_bias_num_segment_buckets
        self.hidden_size = hidden_size
        self.num_attention_heads = num_attention_heads
        self.dim_head = dim_head
        self.dim_ff = dim_ff
        self.num_hidden_layers = num_hidden_layers
        self.position_bias_num_buckets = position_bias_num_buckets
        self.position_bias_max_distance = position_bias_max_distance
        self.dropout_p = dropout_p
        self.eps = eps
        self.use_cache = use_cache
        self.vocab_size = vocab_size
        self.init_std = init_std
        self.distance_scale = distance_scale
        self.half = half
        self.mask_modules = mask_modules

`mindnlp.transformers.models.cpmbee.configuration_cpmbee.CpmBeeConfig.init(vocab_size=30720, hidden_size=4096, num_attention_heads=64, dim_head=64, dim_ff=10240, num_hidden_layers=32, dropout_p=0.0, position_bias_num_buckets=256, position_bias_num_segment_buckets=32, position_bias_max_distance=2048, eps=1e-06, init_std=1.0, use_cache=True, distance_scale=16, mask_modules=None, half=False, **kwargs)` ¶

init

Initializes a CpmBeeConfig instance.

PARAMETER	DESCRIPTION
`vocab_size`	The size of the vocabulary. Defaults to 30720. TYPE: `int` DEFAULT: `30720`
`hidden_size`	The size of the hidden layers. Defaults to 4096. TYPE: `int` DEFAULT: `4096`
`num_attention_heads`	The number of attention heads. Defaults to 64. TYPE: `int` DEFAULT: `64`
`dim_head`	The dimension of each attention head. Defaults to 64. TYPE: `int` DEFAULT: `64`
`dim_ff`	The dimension of the feed forward network. Defaults to 10240. TYPE: `int` DEFAULT: `10240`
`num_hidden_layers`	The number of hidden layers. Defaults to 32. TYPE: `int` DEFAULT: `32`
`dropout_p`	The dropout probability. Defaults to 0.0. TYPE: `int` DEFAULT: `0.0`
`position_bias_num_buckets`	The number of buckets for position bias. Defaults to 256. TYPE: `int` DEFAULT: `256`
`position_bias_num_segment_buckets`	The number of segment buckets for position bias. Defaults to 32. TYPE: `int` DEFAULT: `32`
`position_bias_max_distance`	The maximum distance for position bias. Defaults to 2048. TYPE: `int` DEFAULT: `2048`
`eps`	A small value to avoid division by zero. Defaults to 1e-06. TYPE: `int` DEFAULT: `1e-06`
`init_std`	The standard deviation for weight initialization. Defaults to 1.0. TYPE: `float` DEFAULT: `1.0`
`use_cache`	Flag to indicate whether to use cache. Defaults to True. TYPE: `bool` DEFAULT: `True`
`distance_scale`	The scale factor for distance. Defaults to 16. TYPE: `Union[int, float]` DEFAULT: `16`
`mask_modules`	List or Tuple of modules to be masked. Defaults to None. TYPE: `Optional[Union[List, Tuple]]` DEFAULT: `None`
`half`	Flag to indicate whether to use half precision. Defaults to False. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\configuration_cpmbee.py

def __init__(
    self,
    vocab_size: int = 30720,
    hidden_size: int = 4096,
    num_attention_heads: int = 64,
    dim_head: int = 64,
    dim_ff: int = 10240,
    num_hidden_layers: int = 32,
    dropout_p: int = 0.0,
    position_bias_num_buckets: int = 256,
    position_bias_num_segment_buckets: int = 32,
    position_bias_max_distance: int = 2048,
    eps: int = 1e-6,
    init_std: float = 1.0,
    use_cache: bool = True,
    distance_scale: Union[int, float] = 16,
    mask_modules: Optional[Union[List, Tuple]] = None,
    half: bool = False,
    **kwargs,
):
    """
    __init__

    Initializes a CpmBeeConfig instance.

    Args:
        vocab_size (int): The size of the vocabulary. Defaults to 30720.
        hidden_size (int): The size of the hidden layers. Defaults to 4096.
        num_attention_heads (int): The number of attention heads. Defaults to 64.
        dim_head (int): The dimension of each attention head. Defaults to 64.
        dim_ff (int): The dimension of the feed forward network. Defaults to 10240.
        num_hidden_layers (int): The number of hidden layers. Defaults to 32.
        dropout_p (int): The dropout probability. Defaults to 0.0.
        position_bias_num_buckets (int): The number of buckets for position bias. Defaults to 256.
        position_bias_num_segment_buckets (int): The number of segment buckets for position bias. Defaults to 32.
        position_bias_max_distance (int): The maximum distance for position bias. Defaults to 2048.
        eps (int): A small value to avoid division by zero. Defaults to 1e-06.
        init_std (float): The standard deviation for weight initialization. Defaults to 1.0.
        use_cache (bool): Flag to indicate whether to use cache. Defaults to True.
        distance_scale (Union[int, float]): The scale factor for distance. Defaults to 16.
        mask_modules (Optional[Union[List, Tuple]]): List or Tuple of modules to be masked. Defaults to None.
        half (bool): Flag to indicate whether to use half precision. Defaults to False.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(**kwargs)
    self.position_bias_num_segment_buckets = position_bias_num_segment_buckets
    self.hidden_size = hidden_size
    self.num_attention_heads = num_attention_heads
    self.dim_head = dim_head
    self.dim_ff = dim_ff
    self.num_hidden_layers = num_hidden_layers
    self.position_bias_num_buckets = position_bias_num_buckets
    self.position_bias_max_distance = position_bias_max_distance
    self.dropout_p = dropout_p
    self.eps = eps
    self.use_cache = use_cache
    self.vocab_size = vocab_size
    self.init_std = init_std
    self.distance_scale = distance_scale
    self.half = half
    self.mask_modules = mask_modules

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee` ¶

Tokenization classes for CpmBee.

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer` ¶

Bases: PreTrainedTokenizer

Construct a CPMBee tokenizer.

PARAMETER	DESCRIPTION
`vocab_file`	Path to the vocabulary file. TYPE: `str`
`bos_token`	The beginning of sequence token. TYPE: `str`, optional, defaults to `"<s>"` DEFAULT: `'<s>'`
`eos_token`	The end of sequence token. TYPE: `str`, optional, defaults to `"</s>"` DEFAULT: `'</s>'`
`line_token`	The line token. TYPE: `str`, optional, defaults to `"\n"` DEFAULT: `'\n'`
`space_token`	The space token. TYPE: `str`, optional, defaults to `" "` DEFAULT: `' '`
`unk_token`	The unknown token. TYPE: `str`, optional, defaults to `"<unk>"` DEFAULT: `'<unk>'`
`mask_token`	The mask token. TYPE: `str`, optional, defaults to `"<mask>"` DEFAULT: `'<mask>'`
`pad_token`	The token used for padding. TYPE: `str`, optional, defaults to `"<pad>"` DEFAULT: `'<pad>'`
`padding_side`	The padding side. CPM-Bee will use left padding by default. TYPE: `str`, optional, defaults to `"left"` DEFAULT: `'left'`

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

class CpmBeeTokenizer(PreTrainedTokenizer):
    r"""
    Construct a CPMBee tokenizer.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.
        line_token (`str`, *optional*, defaults to `"\n"`):
            The line token.
        space_token (`str`, *optional*, defaults to `" "`):
            The space token.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token.
        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The mask token.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding.
        padding_side (`str`, *optional*, defaults to `"left"`):
            The padding side. CPM-Bee will use left padding by default.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names: List[str] = [
        "input_ids",
        "attention_mask",
        "input_id_sub",
        "position",
        "context",
        "sample_ids",
        "num_segments",
        "segment",
        "segment_rel_offset",
        "segment_rel",
    ]
    add_prefix_space = False

    def __init__(
        self,
        vocab_file,
        bos_token="<s>",
        eos_token="</s>",
        line_token="\n",
        space_token=" ",
        unk_token="<unk>",
        mask_token="<mask>",
        pad_token="<pad>",
        padding_side="left",
        **kwargs,
    ):
        """
        Initialize a CpmBeeTokenizer object.

        Args:
            vocab_file (str): The path to the file containing the vocabulary.
            bos_token (str, optional): The beginning of sentence token.
            eos_token (str, optional): The end of sentence token.
            line_token (str, optional): The token used to represent a new line.
            space_token (str, optional): The token used to represent a space.
            unk_token (str, optional): The token used to represent unknown words.
            mask_token (str, optional): The token used for masking.
            pad_token (str, optional): The token used for padding.
            padding_side (str, optional): The side to apply padding.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            FileNotFoundError: If the vocab_file does not exist.
            TypeError: If any of the arguments are of incorrect type.
        """
        self.encoder: Dict[str, int] = {}
        super().__init__(
            bos_token=bos_token,
            eos_token=eos_token,
            line_token=line_token,
            space_token=space_token,
            unk_token=unk_token,
            mask_token=mask_token,
            pad_token=pad_token,
            padding_side=padding_side,
            **kwargs,
        )

        with open(vocab_file, "r", encoding="utf-8") as reader:
            for token in reader.readlines():
                token = token.rstrip("\n")
                if len(token) == 0:
                    continue
                self.encoder[token] = len(self.encoder)

        self.encoder[" "] = self.encoder["</_>"]
        self.encoder["\n"] = self.encoder["</n>"]
        del self.encoder["</_>"]
        del self.encoder["</n>"]

        self.decoder = {v: k for k, v in self.encoder.items()}

        self._max_word_len = max(len(x) for x in self.encoder.keys())
        self.cpmbee_special_tokens = {k: v for k, v in self.encoder.items() if k.startswith("<") and k.endswith(">")}

        self.ext_table: Dict[int, str] = {}
        self.ext_table_rev: Dict[str, int] = {}

        self.token_id_table: Dict[str, Dict[int, int]] = {}
        self.ext_special_tokens = []

        self.ext_args_for_model = [
            "input_id_subs",
            "input_pos",
            "context",
            "segment_ids",
            "segment_rel_offset",
            "segment_rel",
            "sample_ids",
            "num_segments",
            "predict_segments",
            "answer_placeholders",
            "ext_table",
            "token_id_table",
        ]

    @property
    def bod_token_id(self):
        """
        Returns the token ID for the beginning of document (BOD) token.

        Args:
            self: An instance of the CpmBeeTokenizer class.

        Returns:
            None: This method returns the token ID corresponding to the BOD token in the encoder dictionary.

        Raises:
            None.
        """
        return self.encoder[self.bod_token]

    @property
    def eod_token_id(self):
        """
        Method to retrieve the token ID corresponding to the end-of-document token in the CpmBeeTokenizer class.

        Args:
            self: An instance of the CpmBeeTokenizer class.

        Returns:
            None: The method returns the token ID of the end-of-document token in the tokenizer's encoder.

        Raises:
            None.
        """
        return self.encoder[self.eod_token]

    @property
    def newline_id(self):
        """
        Returns the ID of the newline token in the CpmBeeTokenizer.

        Args:
            self (CpmBeeTokenizer): An instance of the CpmBeeTokenizer class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.encoder[self.line_token]

    @property
    def vocab_size(self) -> int:
        """
        Returns the size of the vocabulary used by the CpmBeeTokenizer instance.

        Args:
            self:
                The CpmBeeTokenizer instance.

                - This parameter is of type 'CpmBeeTokenizer'.
                - It represents the instance of the CpmBeeTokenizer class on which the method is called.

        Returns:
            int:
                An integer representing the size of the vocabulary.

                - The returned value represents the total number of unique tokens in the vocabulary.

        Raises:
            None.

        Example:
            ```python
            >>> tokenizer = CpmBeeTokenizer()
            >>> tokenizer.vocab_size()
            5000
            ```
        """
        return len(self.encoder)

    def __len__(self):
        """
        Size of the full vocabulary with the added tokens.
        """
        return self.vocab_size + len(self.added_tokens_encoder)

    def get_vocab(self):
        """
        Get the vocabulary of the CpmBeeTokenizer instance.

        Args:
            self (CpmBeeTokenizer): The instance of the CpmBeeTokenizer class.
                This parameter represents the current instance of the tokenizer.

        Returns:
            dict: A dictionary containing the combined encoder and added tokens encoder.
                The keys represent tokens, and the values represent their corresponding IDs.

        Raises:
            None.
        """
        return dict(self.encoder, **self.added_tokens_encoder)

    def get_piece(self, text: str) -> str:
        """
        Match with maximum length.
        """
        len_text = len(text)
        for i in range(len(text)):
            sub = text[: len_text - i]
            if (sub in self.encoder) or (sub in self.added_tokens_encoder):
                return sub
        return text[0]

    def tokenize(self, text: TextInput, **kwargs) -> List[str]:
        r"""
        Override the `tokenize` to meet the needs of CPMBee:

        1. Mark the special token with `<` and `>`. The `<>` will be ignored.
        2. Split sentences by the marked special tokens.
        3. Record the marked special token by `ext_table` and `ext_table_rev`.
        4. Tokenize the sentence without special tokens.
        """
        for_cpmbee = kwargs.get("for_cpmbee", False)
        all_special_tokens_extended = {
            str(t): t for t in self.all_special_tokens_extended if isinstance(t, AddedToken)
        }

        sentence_split = [""]
        is_special_token = False
        for i, c in enumerate(text):
            if is_special_token:
                if c == "<":
                    tail = sentence_split.pop(-1)
                    sentence_split[-1] += tail
                    sentence_split.append(c)
                    is_special_token = False
                elif c == ">":
                    # end of special token
                    sentence_split[-1] += c
                    if sentence_split[-1] == "<>":
                        continue
                    is_special_token = False
                    sentence_split.append("")
                else:
                    sentence_split[-1] += c
            else:
                if c == "<":
                    is_special_token = True
                    sentence_split.append(c)
                else:
                    sentence_split[-1] += c
        if is_special_token:
            tail = sentence_split.pop(-1)
            sentence_split[-1] += tail

        output_tokens = []
        for i, part in enumerate(sentence_split):
            if (i & 1) == 1:
                # special token
                output_tokens.append(part)
                if for_cpmbee and (part not in self.encoder) and (part not in self.ext_table_rev):
                    self.ext_table_rev[part] = len(self.ext_table_rev) + self.vocab_size
                    self.ext_table[self.ext_table_rev[part]] = part
            else:
                output_tokens.extend(self._tokenize(part, for_cpmbee=for_cpmbee))

        # drop spaces
        for i, token in enumerate(output_tokens):
            if token in self.added_tokens_encoder:
                token = all_special_tokens_extended.get(token, None)
                left = output_tokens[i - 1] if i > 0 else None
                right = output_tokens[i + 1] if i < len(output_tokens) - 1 else None
                if isinstance(token, AddedToken):
                    if token.rstrip and right:
                        # A bit counter-intuitive but we strip the left of the string
                        # since tok_extended.rstrip means the special token is eating all white spaces on its right
                        output_tokens[i + 1] = right.lstrip()
                    # Strip white spaces on the left
                    if token.lstrip and left:
                        output_tokens[i - 1] = left.rstrip()  # Opposite here
                else:
                    if right:
                        output_tokens[i + 1] = right.lstrip()
                    if left:
                        output_tokens[i - 1] = left.rstrip()

        skipped_tokens = []
        for token in output_tokens:
            if not token:
                continue
            skipped_tokens.append(token)

        return skipped_tokens

    def _tokenize(self, text, **kwargs):
        """
        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
        vocabulary.

        Do NOT take care of added tokens. Record the unk tokens and special tokens in `ext_table` and `ext_table_rev`.
        """
        for_cpmbee = kwargs.get("for_cpmbee", False)
        output_tokens = []

        part_st = 0
        last_unk = None
        while part_st < len(text):
            piece = self.get_piece(text[part_st:])
            if piece in self.encoder or self.added_tokens_encoder:
                if last_unk is None:
                    output_tokens.append(piece)
                else:
                    if for_cpmbee and (last_unk not in self.ext_table_rev):
                        self.ext_table_rev[last_unk] = len(self.ext_table_rev) + self.vocab_size
                        self.ext_table[self.ext_table_rev[last_unk]] = last_unk
                    output_tokens.append(last_unk)
                    output_tokens.append(piece)
                    last_unk = None
            else:
                if last_unk is None:
                    last_unk = piece
                else:
                    last_unk += piece
            part_st += len(piece)
        if last_unk is not None:
            # part end with UNK
            if for_cpmbee and (last_unk not in self.ext_table_rev):
                self.ext_table_rev[last_unk] = len(self.ext_table_rev) + self.vocab_size
                self.ext_table[self.ext_table_rev[last_unk]] = last_unk
            output_tokens.append(last_unk)

        return output_tokens

    def check(self, token):
        """
        Checks if a token is present in the encoder.

        Args:
            self (CpmBeeTokenizer): An instance of the CpmBeeTokenizer class.
            token (Any): The token to be checked in the encoder.

        Returns:
            None.

        Raises:
            None.
        """
        return token in self.encoder

    def convert_tokens_to_string(self, tokens: List[str]) -> str:
        """
        Converts a list of tokens into a single string.

        Args:
            self (CpmBeeTokenizer): An instance of the CpmBeeTokenizer class.
            tokens (List[str]): A list of tokens to be converted into a string.

        Returns:
            str: A string representation of the tokens.

        Raises:
            None.

        This method takes in two parameters, self and tokens. The self parameter is an instance of the CpmBeeTokenizer
        class and is used to access the class's attributes and methods. The tokens parameter is a
        list of strings representing individual tokens.

        The function returns a string that is obtained by concatenating all the tokens together using the ''.join() method.
        This method does not modify the original list of tokens.

        No exceptions are raised by this method.
        """
        return "".join(tokens)

    def _convert_token_to_id(self, token: str):
        """Converts a token (str) in an id using the vocab and ext_table."""
        if token in self.encoder:
            return self.encoder.get(token)
        elif token in self.ext_table_rev:
            return self.ext_table_rev[token]
        elif token in self.added_tokens_encoder:
            return self.added_tokens_encoder[token]
        else:
            return self.unk_token_id

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab and ext_table."""
        if index in self.ext_table:
            return self.ext_table[index]
        elif index in self.added_tokens_decoder:
            return self.added_tokens_decoder[index]
        else:
            if index >= 0:
                return self.decoder[index]

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary to a file.

        Args:
            self (CpmBeeTokenizer): The instance of the CpmBeeTokenizer class.
            save_directory (str): The directory where the vocabulary file will be saved.
            filename_prefix (Optional[str]): An optional prefix to prepend to the filename. Default is None.

        Returns:
            Tuple[str]: A tuple containing the path to the saved vocabulary file.

        Raises:
            IOError: If there is an issue with reading or writing the vocabulary file.
            ValueError: If the provided save_directory is not a valid directory.
            KeyError: If any of the keys used for encoding tokens are not found in the encoder dictionary.
        """
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
            )
        else:
            vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
        index = 0
        self.encoder["</n>"] = self.encoder["\n"]
        del self.encoder["\n"]
        self.encoder["</_>"] = self.encoder[" "]
        del self.encoder[" "]
        with open(vocab_file, "w", encoding="utf-8") as writer:
            for token, token_index in sorted(self.encoder.items(), key=lambda x: x[1]):
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                        " Please check that the vocabulary is not corrupted!"
                    )
                    index = token_index
                writer.write(token + "\n")
                index += 1
        return (vocab_file,)

    def __call__(self, text, *args, **kwargs):
        r"""
        CPMBee `call` method will use `_tokenize_cpmbee` when the input type is dict.
        """
        if isinstance(text, dict):
            return self._batch_tokenize_cpmbee([text], *args, **kwargs)
        elif isinstance(text, (list, tuple)):
            if isinstance(text[0], dict):
                return self._batch_tokenize_cpmbee(text, *args, **kwargs)
            else:
                return super().__call__(text, *args, **kwargs)
        else:
            return super().__call__(text, *args, **kwargs)

    # 分词
    def _tokenize_cpmbee(self, data: TextInput, *args, **kwargs) -> List[str]:
        """
        A tokenize method to process dict data. Exclusive for CPMBee.
        """
        if isinstance(data, str):
            data = json.loads(data)
        if not isinstance(data, Dict):
            raise TypeError(
                "CpmBeeTokenizer input data should be dict or str in dict format, but got {}".format(type(data))
            )

        # 1. prepare answer placeholder
        answer_placeholders = []

        def _put_placeholder(data: Any, path: List[str] = []):
            if isinstance(data, dict):
                ret = {}
                for k, v in data.items():
                    ret[k] = _put_placeholder(v, path + [k])
                return ret
            else:
                answer_placeholders.append(path)
                return "<ans_{}>".format(len(answer_placeholders))

        data["<ans>"] = _put_placeholder(data["<ans>"])

        (
            input_ids,
            input_id_subs,
            context,
            segment_ids,
            segment_rel,
            n_segments,
            table_states,
        ) = self.convert_data_to_id(data, shuffle_answer=False, max_depth=8)

        # <ans> mapping from sub to id
        sub_ans_map: Dict[int, int] = {}
        for fake_id, token_sub in table_states["token_id_table"]["<ans>"].items():
            token = table_states["ext_table"][fake_id]
            if token.startswith("<ans_") and token.endswith(">"):
                ans_id = int(token[5:-1])
                sub_ans_map[token_sub] = ans_id

        tmp_input_ids = []
        tmp_input_sub = []
        tmp_input_seg = []

        # get predict segments
        predict_segments: List[Tuple[int, int]] = []
        for i in range(input_ids.shape[0]):
            if context[i] == 0:
                if input_ids[i] == self.encoder["<ans>"]:
                    # is ans
                    # (segment_id, ans_id)
                    predict_segments.append((segment_ids[i], sub_ans_map[input_id_subs[i]]))
            else:
                tmp_input_ids.append(input_ids[i])
                tmp_input_sub.append(input_id_subs[i])
                tmp_input_seg.append(segment_ids[i])

        if len(predict_segments) == 0:
            raise ValueError("No answer to predict")

        input_ids = np.array(tmp_input_ids, dtype=np.int32)  # all context
        input_id_subs = np.array(tmp_input_sub, dtype=np.int32)  # [0, 0, 0, 0, 1, 0, 0, 2, 0, ...]
        context = np.full_like(tmp_input_ids, 1, dtype=np.int8)  # [1, 1, 1, ...]
        segment_ids = np.array(tmp_input_seg, dtype=np.int32)  # [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, ...]
        sample_ids = np.zeros(input_ids.shape, dtype=np.int32)  # [0, 0, 0, 0, ...]
        segment_rel_offset = np.zeros(input_ids.shape, dtype=np.int32)  # [0, 0, 0, ...]
        num_segments = np.full(input_ids.shape, n_segments, dtype=np.int32)  # [n_seg, n_seg, n_seg, ...]
        input_pos = np.arange(input_ids.shape[0], dtype=np.int32)  # [0, 1, 2, 3, 4, ...]

        return (
            self.prepare_for_model(
                input_ids.tolist(),
                input_id_subs=input_id_subs.tolist(),
                input_pos=input_pos.tolist(),
                context=context.tolist(),
                segment_ids=segment_ids.tolist(),
                segment_rel_offset=segment_rel_offset.tolist(),
                segment_rel=segment_rel.tolist(),
                sample_ids=sample_ids.tolist(),
                num_segments=num_segments.tolist(),
                **kwargs,
            ),
            predict_segments,
            answer_placeholders,
            table_states["ext_table"],
            table_states["token_id_table"],
        )

    def _batch_tokenize_cpmbee(self, data_lst, *args, **kwargs):
        """
        Batched _token_cpmbee.
        """
        return_tensors = kwargs.get("return_tensors", None)
        batch_outputs = {}
        segment_rel_pack = []
        other_info = []

        batch_ext_table_map: Dict[Tuple[int, int], int] = {}
        batch_ext_table_ids: List[int] = []
        batch_ext_table_sub: List[int] = []

        for data in data_lst:
            self.ext_table = {}
            self.ext_table_rev = {}
            self.token_id_table = {}
            (outputs, predict_segments, answer_placeholders, ext_table, token_id_table) = self._tokenize_cpmbee(
                data,
                truncation=None,
                padding=PaddingStrategy.DO_NOT_PAD.value,
                max_length=None,
                pad_to_multiple_of=None,
                return_attention_mask=False,
                return_tensors=None,
            )
            rev_ext_table = {}
            for token, mp in token_id_table.items():
                if token == "<ans>":
                    continue
                token_id = self.encoder[token]
                for fake_id, token_sub in mp.items():
                    if token_sub > 0:
                        if (token_id, token_sub) not in batch_ext_table_map:
                            batch_ext_table_map[(token_id, token_sub)] = len(batch_ext_table_ids) + self.vocab_size
                            batch_ext_table_ids.append(token_id)
                            batch_ext_table_sub.append(token_sub)
                        rev_ext_table[batch_ext_table_map[(token_id, token_sub)]] = ext_table[fake_id]
                    else:
                        rev_ext_table[token_id] = ext_table[fake_id]

            segment_rel_pack.append(np.array(outputs.pop("segment_rel")))
            other_info.append(
                {
                    "predict_segments": predict_segments,
                    "answer_placeholders": answer_placeholders,
                    "ext_table": rev_ext_table,
                }
            )

            for key, value in outputs.items():
                if key not in batch_outputs:
                    batch_outputs[key] = []
                batch_outputs[key].append(value)

        max_length = max(len(item) for item in batch_outputs[self.model_input_names[0]])
        batch_size = len(batch_outputs[self.model_input_names[0]])
        for i in range(batch_size):
            inputs = {k: v[i] for k, v in batch_outputs.items()}

            for k, v in inputs.items():
                required_input = v

                needs_to_be_padded = len(required_input) != max_length

                if needs_to_be_padded:
                    difference = max_length - len(required_input)
                    batch_outputs[k][i] = [self.pad_token_id] * difference + required_input

        max_num_rels = 0
        for rel in segment_rel_pack:
            max_num_rels = max(max_num_rels, rel.shape[0])
        padded_rels = np.zeros((len(segment_rel_pack), max_num_rels), dtype=np.int32)
        for i, rel in enumerate(segment_rel_pack):
            padded_rels[i, : rel.shape[0]] = rel
        batch_outputs["segment_rel"] = padded_rels
        batch_outputs["batch_ext_table_ids"] = np.array(batch_ext_table_ids, dtype=np.int32)
        batch_outputs["batch_ext_table_sub"] = np.array(batch_ext_table_sub, dtype=np.int32)
        batch_outputs = BatchEncoding(batch_outputs, tensor_type=return_tensors)
        batch_outputs["other_info"] = other_info

        return batch_outputs

    def convert_data_to_id(
        self,
        data: Any,
        prev_ext_states: Optional[_PrevExtTableStates] = None,
        shuffle_answer: bool = True,
        max_depth: int = 8,
    ):
        """
        Parse a dict to data ids. Exclusive for CPMBee. It will

        1. parse the dict to segments and get segment_rel, which for calculating of position_bias.
        2. tokenize every segment.
        """
        root: _DictTree = {
            "value": "<root>",
            "children": [],
            "depth": 0,
            "segment_id": 0,
            "need_predict": False,
        }

        segments = [root]

        def _build_dict_tree(data: CPMBeeInputType, depth: int, need_predict: bool) -> List[_DictTree]:
            if isinstance(data, dict):
                ret_list: List[_DictTree] = []
                curr_items = list(data.items())
                if need_predict and shuffle_answer:
                    access_idx = np.arange(len(curr_items))
                    np.random.shuffle(access_idx)
                    curr_items = [curr_items[idx] for idx in access_idx]
                for k, v in curr_items:
                    child_info: _DictTree = {
                        "value": k,
                        "children": [],
                        "depth": depth,
                        "segment_id": len(segments),
                        "need_predict": False,  # only leaves are contexts
                    }
                    segments.append(child_info)
                    child_info["children"] = _build_dict_tree(
                        v, depth + 1, need_predict or (depth == 1 and k == "<ans>")
                    )  # elements in <root>.<ans>

                    ret_list.append(child_info)
                return ret_list
            else:
                assert isinstance(data, str), "Invalid data {}".format(data)
                ret: _DictTree = {
                    "value": data,
                    "children": [],
                    "depth": depth,
                    "segment_id": len(segments),
                    "need_predict": need_predict,
                }
                segments.append(ret)
                return [ret]

        root["children"] = _build_dict_tree(data, 1, False)

        num_segments = len(segments)
        segment_rel = np.zeros((num_segments * num_segments,), dtype=np.int32)

        def _build_segment_rel(node: _DictTree) -> List[Tuple[int, int]]:
            ret: List[Tuple[int, int]] = [(node["segment_id"], node["depth"])]
            for child in node["children"]:
                sub = _build_segment_rel(child)
                for seg_id_1, depth_1 in sub:
                    for seg_id_2, depth_2 in ret:
                        n_up = min(depth_1 - node["depth"], max_depth - 1)
                        n_down = min(depth_2 - node["depth"], max_depth - 1)
                        segment_rel[seg_id_1 * num_segments + seg_id_2] = rel_to_bucket(
                            n_up, n_down, max_depth=max_depth
                        )
                        segment_rel[seg_id_2 * num_segments + seg_id_1] = rel_to_bucket(
                            n_down, n_up, max_depth=max_depth
                        )
                ret.extend(sub)
            return ret

        _build_segment_rel(root)

        input_ids: List[int] = []
        input_id_subs: List[int] = []
        segment_bound: List[Tuple[int, int]] = []

        if prev_ext_states is not None:
            self.ext_table = prev_ext_states["ext_table"]
            self.token_id_table = prev_ext_states["token_id_table"]

        for seg in segments:
            # tokenize
            tokens = self.convert_tokens_to_ids(self.tokenize(seg["value"], for_cpmbee=True))

            token_id_subs = []
            reid_token_ids = []
            for idx in tokens:
                if idx in self.ext_table:
                    # unk or special token
                    token = self.ext_table[idx]
                    if token.startswith("<") and token.endswith(">"):
                        # special token
                        if "_" in token:
                            token_name = token[1:-1].split("_", maxsplit=1)[0]
                        else:
                            token_name = token[1:-1]
                        token_name = "<{}>".format(token_name)
                    else:
                        token_name = "<unk>"

                    if token_name not in self.token_id_table:
                        self.token_id_table[token_name] = {}
                    if idx not in self.token_id_table[token_name]:
                        self.token_id_table[token_name][idx] = len(self.token_id_table[token_name])
                    if token_name not in self.encoder:
                        raise ValueError("Invalid token {}".format(token))
                    reid_token_ids.append(self.encoder[token_name])
                    token_id_subs.append(self.token_id_table[token_name][idx])
                else:
                    reid_token_ids.append(idx)
                    token_id_subs.append(0)
            tokens = [self.bos_token_id] + reid_token_ids
            token_id_subs = [0] + token_id_subs
            # eos_id 表示 no need_predict
            if not seg["need_predict"]:  # eos
                tokens = tokens + [self.eos_token_id]
                token_id_subs = token_id_subs + [0]
            else:
                # no eos
                pass
            begin = len(input_ids)
            input_ids.extend(tokens)
            input_id_subs.extend(token_id_subs)
            end = len(input_ids)
            segment_bound.append((begin, end))

        ids = np.array(input_ids, dtype=np.int32)
        id_subs = np.array(input_id_subs, dtype=np.int32)
        segs = np.zeros((ids.shape[0],), dtype=np.int32)  # 按segment_bound对seg编号
        context = np.zeros((ids.shape[0],), dtype=np.int8)
        for i, (begin, end) in enumerate(segment_bound):
            if not segments[i]["need_predict"]:
                context[begin:end] = 1
            segs[begin:end] = i

        curr_ext_table_states: _PrevExtTableStates = {
            "ext_table": self.ext_table,
            "token_id_table": self.token_id_table,
        }
        return ids, id_subs, context, segs, segment_rel, num_segments, curr_ext_table_states

    def prepare_for_model(
        self,
        ids: List[int],
        pair_ids: Optional[List[int]] = None,
        add_special_tokens: bool = True,
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        stride: int = 0,
        pad_to_multiple_of: Optional[int] = None,
        return_tensors: Optional[Union[str, TensorType]] = None,
        return_token_type_ids: Optional[bool] = None,
        return_attention_mask: Optional[bool] = None,
        return_overflowing_tokens: bool = False,
        return_special_tokens_mask: bool = False,
        return_length: bool = False,
        verbose: bool = True,
        prepend_batch_axis: bool = False,
        **kwargs,
    ) -> BatchEncoding:
        """
        Prepares a sequence of input id, or a pair of sequences of inputs ids so that it can be used by the model. It
        adds special tokens, truncates sequences if overflowing while taking into account the special tokens and
        manages a moving window (with user defined stride) for overflowing tokens. Please Note, for *pair_ids*
        different than `None` and *truncation_strategy = longest_first* or `True`, it is not possible to return
        overflowing tokens. Such a combination of arguments will raise an error.

        Args:
            ids (`List[int]`):
                Tokenized input ids of the first sequence. Can be obtained from a string by chaining the `tokenize` and
                `convert_tokens_to_ids` methods.
            pair_ids (`List[int]`, *optional*):
                Tokenized input ids of the second sequence. Can be obtained from a string by chaining the `tokenize`
                and `convert_tokens_to_ids` methods.
        """
        # Backward compatibility for 'truncation_strategy', 'pad_to_max_length'
        padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
            padding=padding,
            truncation=truncation,
            max_length=max_length,
            pad_to_multiple_of=pad_to_multiple_of,
            verbose=verbose,
            **kwargs,
        )

        pair = bool(pair_ids is not None)
        len_ids = len(ids)
        len_pair_ids = len(pair_ids) if pair else 0

        if return_token_type_ids and not add_special_tokens:
            raise ValueError(
                "Asking to return token_type_ids while setting add_special_tokens to False "
                "results in an undefined behavior. Please set add_special_tokens to True or "
                "set return_token_type_ids to None."
            )

        if (
            return_overflowing_tokens
            and truncation_strategy == TruncationStrategy.LONGEST_FIRST
            and pair_ids is not None
        ):
            raise ValueError(
                "Not possible to return overflowing tokens for pair of sequences with the "
                "`longest_first`. Please select another truncation strategy than `longest_first`, "
                "for instance `only_second` or `only_first`."
            )

        # Load from model defaults
        if return_token_type_ids is None:
            return_token_type_ids = "token_type_ids" in self.model_input_names
        if return_attention_mask is None:
            return_attention_mask = "attention_mask" in self.model_input_names

        encoded_inputs = {}

        # Compute the total size of the returned encodings
        total_len = len_ids + len_pair_ids + (self.num_special_tokens_to_add(pair=pair) if add_special_tokens else 0)

        # Truncation: Handle max sequence length
        overflowing_tokens = []
        if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE and max_length and total_len > max_length:
            ids, pair_ids, overflowing_tokens = self.truncate_sequences(
                ids,
                pair_ids=pair_ids,
                num_tokens_to_remove=total_len - max_length,
                truncation_strategy=truncation_strategy,
                stride=stride,
            )

        if return_overflowing_tokens:
            encoded_inputs["overflowing_tokens"] = overflowing_tokens
            encoded_inputs["num_truncated_tokens"] = total_len - max_length

        # Add special tokens
        if add_special_tokens:
            sequence = self.build_inputs_with_special_tokens(ids, pair_ids)
            token_type_ids = self.create_token_type_ids_from_sequences(ids, pair_ids)
        else:
            sequence = ids + pair_ids if pair else ids
            token_type_ids = [0] * len(ids) + ([0] * len(pair_ids) if pair else [])

        # Build output dictionary
        encoded_inputs["input_ids"] = sequence
        if return_token_type_ids:
            encoded_inputs["token_type_ids"] = token_type_ids
        if return_special_tokens_mask:
            if add_special_tokens:
                encoded_inputs["special_tokens_mask"] = self.get_special_tokens_mask(ids, pair_ids)
            else:
                encoded_inputs["special_tokens_mask"] = [0] * len(sequence)

        # Check lengths
        self._eventual_warn_about_too_long_sequence(encoded_inputs["input_ids"], max_length, verbose)

        # Padding
        if padding_strategy != PaddingStrategy.DO_NOT_PAD or return_attention_mask:
            encoded_inputs = self.pad(
                encoded_inputs,
                max_length=max_length,
                padding=padding_strategy.value,
                pad_to_multiple_of=pad_to_multiple_of,
                return_attention_mask=return_attention_mask,
            )

        if return_length:
            encoded_inputs["length"] = len(encoded_inputs["input_ids"])

        # for CPMBee, encode all the model arguments
        for arg in self.ext_args_for_model:
            v = kwargs.get(arg, None)
            if v is not None:
                encoded_inputs[arg] = v

        batch_outputs = BatchEncoding(
            encoded_inputs, tensor_type=return_tensors, prepend_batch_axis=prepend_batch_axis
        )

        return batch_outputs

    def prepare_for_finetune(
        self,
        data_list: List[Dict],
        max_length: int = 2048
    ):
        """
        Prepares the input data for fine-tuning.

        Args:
            self (CpmBeeTokenizer): The instance of the CpmBeeTokenizer class.
            data_list (List[Dict]): A list of dictionaries containing the input data.
            max_length (int, optional): The maximum length of the input data. Defaults to 2048.

        Returns:
            None.

        Raises:
            None.
        """
        _inputs: List[NDArray[np.int32]] = []
        _inputs_sub: List[NDArray[np.int32]] = []
        _context: List[NDArray[np.int8]] = []
        _sample_ids: List[NDArray[np.int32]] = []
        _segments: List[NDArray[np.int32]] = []
        _num_segments: List[NDArray[np.int32]] = []
        _segment_rel_offset: List[NDArray[np.int32]] = []
        _segment_rel: List[NDArray[np.int32]] = []
        _spans: List[List[int]] = []
        _raw_data: List[List[Any]] = []

        raw_data = {}
        for data in data_list:
            (
                input_ids,
                input_id_subs,
                context,
                segment_ids,
                segment_rel,
                n_segments,
                _
            ) = self.convert_data_to_id(data)

            input_ids = input_ids[: max_length]
            context = context[: max_length]
            segment_ids = segment_ids[: max_length]
            raw_data["input"] = data
            raw_data["samples"] = []

            sample_ids = np.zeros(input_ids.shape, dtype=np.int32)
            segment_rel_offset = np.zeros(input_ids.shape, dtype=np.int32)
            num_segments = np.full(input_ids.shape, n_segments, dtype=np.int32)

            _inputs.append(input_ids)
            _inputs_sub.append(input_id_subs)
            _context.append(context)
            _sample_ids.append(sample_ids)
            _segments.append(segment_ids)
            _num_segments.append(num_segments)
            _segment_rel_offset.append(segment_rel_offset)
            _segment_rel.append(segment_rel)
            _spans.append([input_ids.shape[0]])
            _raw_data.append([raw_data])

        batch_size = len(_inputs)
        inputs = np.zeros((batch_size, max_length), dtype=np.int32)
        inputs_sub = np.zeros((batch_size, max_length), dtype=np.int32)
        context = np.zeros((batch_size, max_length), dtype=np.int8)
        sample_ids = np.zeros((batch_size, max_length), dtype=np.int32)
        segments = np.zeros((batch_size, max_length), dtype=np.int32)
        num_segments = np.zeros((batch_size, max_length), dtype=np.int32)
        segment_rel_offset = np.zeros((batch_size, max_length), dtype=np.int32)
        tgt = np.full((batch_size, max_length), -100, dtype=np.int32)

        max_rel = 0
        for i in range(batch_size):
            max_rel = max(max_rel, _segment_rel[i].shape[0])
        segment_rel = np.zeros((batch_size, max_rel), dtype=np.int32)
        spans = np.zeros((batch_size, max_length), dtype=np.int32)
        length = np.zeros((batch_size,), dtype=np.int32)

        batch_ext_table_map: Dict[Tuple[int, int], int] = {}
        batch_ext_table_ids: List[int] = []
        batch_ext_table_sub: List[int] = []
        raw_data_list: List[Any] = []

        for i in range(batch_size):
            instance_length = _inputs[i].shape[0]
            rel_size = _segment_rel[i].shape[0]
            inputs[i, :instance_length] = _inputs[i]
            inputs_sub[i, :instance_length] = _inputs_sub[i]
            context[i, :instance_length] = _context[i]
            sample_ids[i, :instance_length] = _sample_ids[i]
            segments[i, :instance_length] = _segments[i]
            num_segments[i, :instance_length] = _num_segments[i]
            segment_rel_offset[i, :instance_length] = _segment_rel_offset[i]
            segment_rel[i, :rel_size] = _segment_rel[i]

            span_begin = 0
            for span_id, span_end in enumerate(_spans[i]):
                spans[i, span_begin:span_end] = span_id
                span_begin = span_end
            length[i] = instance_length
            raw_data_list.extend(_raw_data[i])

            for j in range(instance_length):
                idx, idx_sub = _inputs[i][j], _inputs_sub[i][j]
                tgt_idx = idx
                if idx_sub > 0:
                    # need to be in ext table
                    if (idx, idx_sub) not in batch_ext_table_map:
                        batch_ext_table_map[(idx, idx_sub)] = len(batch_ext_table_map)
                        batch_ext_table_ids.append(idx)
                        batch_ext_table_sub.append(idx_sub)
                    tgt_idx = batch_ext_table_map[(idx, idx_sub)] + self.vocab_size
                if j > 1 and context[i, j - 1] == 0:
                    if idx != self.bos_token_id:
                        tgt[i, j - 1] = tgt_idx
                    else:
                        tgt[i, j - 1] = self.eos_token_id
            if context[i, instance_length - 1] == 0:
                tgt[i, instance_length - 1] = self.eos_token_id

        if len(batch_ext_table_map) == 0:
            # placeholder
            batch_ext_table_ids.append(0)
            batch_ext_table_sub.append(1)

        return BatchEncoding({
            "input_ids": inputs,
            "input_id_sub": inputs_sub,
            "length": length,
            "context": context > 0,
            "sample_ids": sample_ids,
            "num_segments": num_segments,
            "segment": segments,
            "segment_rel_offset": segment_rel_offset,
            "segment_rel": segment_rel,
            "span": spans,
            "labels": tgt,
            "ext_table_ids": np.array(batch_ext_table_ids, dtype=np.int32),
            "ext_table_sub": np.array(batch_ext_table_sub, dtype=np.int32)
        }, tensor_type="ms")

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.bod_token_id` `property` ¶

Returns the token ID for the beginning of document (BOD) token.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeTokenizer class.

RETURNS	DESCRIPTION
`None`	This method returns the token ID corresponding to the BOD token in the encoder dictionary.

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.eod_token_id` `property` ¶

Method to retrieve the token ID corresponding to the end-of-document token in the CpmBeeTokenizer class.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeTokenizer class.

RETURNS	DESCRIPTION
`None`	The method returns the token ID of the end-of-document token in the tokenizer's encoder.

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.newline_id` `property` ¶

Returns the ID of the newline token in the CpmBeeTokenizer.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeTokenizer class. TYPE: `CpmBeeTokenizer`

RETURNS	DESCRIPTION
	None.

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.vocab_size: int` `property` ¶

Returns the size of the vocabulary used by the CpmBeeTokenizer instance.

PARAMETER	DESCRIPTION
`self`	The CpmBeeTokenizer instance. This parameter is of type 'CpmBeeTokenizer'. It represents the instance of the CpmBeeTokenizer class on which the method is called.

RETURNS	DESCRIPTION
`int`	An integer representing the size of the vocabulary. The returned value represents the total number of unique tokens in the vocabulary. TYPE: `int`

Example

>>> tokenizer = CpmBeeTokenizer()
>>> tokenizer.vocab_size()
5000

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.call(text, *args, **kwargs)` ¶

CPMBee call method will use _tokenize_cpmbee when the input type is dict.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def __call__(self, text, *args, **kwargs):
    r"""
    CPMBee `call` method will use `_tokenize_cpmbee` when the input type is dict.
    """
    if isinstance(text, dict):
        return self._batch_tokenize_cpmbee([text], *args, **kwargs)
    elif isinstance(text, (list, tuple)):
        if isinstance(text[0], dict):
            return self._batch_tokenize_cpmbee(text, *args, **kwargs)
        else:
            return super().__call__(text, *args, **kwargs)
    else:
        return super().__call__(text, *args, **kwargs)

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.init(vocab_file, bos_token='<s>', eos_token='</s>', line_token='\n', space_token=' ', unk_token='<unk>', mask_token='<mask>', pad_token='<pad>', padding_side='left', **kwargs)` ¶

Initialize a CpmBeeTokenizer object.

PARAMETER	DESCRIPTION
`vocab_file`	The path to the file containing the vocabulary. TYPE: `str`
`bos_token`	The beginning of sentence token. TYPE: `str` DEFAULT: `'<s>'`
`eos_token`	The end of sentence token. TYPE: `str` DEFAULT: `'</s>'`
`line_token`	The token used to represent a new line. TYPE: `str` DEFAULT: `'\n'`
`space_token`	The token used to represent a space. TYPE: `str` DEFAULT: `' '`
`unk_token`	The token used to represent unknown words. TYPE: `str` DEFAULT: `'<unk>'`
`mask_token`	The token used for masking. TYPE: `str` DEFAULT: `'<mask>'`
`pad_token`	The token used for padding. TYPE: `str` DEFAULT: `'<pad>'`
`padding_side`	The side to apply padding. TYPE: `str` DEFAULT: `'left'`
`**kwargs`	Additional keyword arguments. DEFAULT: `{}`

RETURNS	DESCRIPTION
	None.

RAISES	DESCRIPTION
`FileNotFoundError`	If the vocab_file does not exist.
`TypeError`	If any of the arguments are of incorrect type.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def __init__(
    self,
    vocab_file,
    bos_token="<s>",
    eos_token="</s>",
    line_token="\n",
    space_token=" ",
    unk_token="<unk>",
    mask_token="<mask>",
    pad_token="<pad>",
    padding_side="left",
    **kwargs,
):
    """
    Initialize a CpmBeeTokenizer object.

    Args:
        vocab_file (str): The path to the file containing the vocabulary.
        bos_token (str, optional): The beginning of sentence token.
        eos_token (str, optional): The end of sentence token.
        line_token (str, optional): The token used to represent a new line.
        space_token (str, optional): The token used to represent a space.
        unk_token (str, optional): The token used to represent unknown words.
        mask_token (str, optional): The token used for masking.
        pad_token (str, optional): The token used for padding.
        padding_side (str, optional): The side to apply padding.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        FileNotFoundError: If the vocab_file does not exist.
        TypeError: If any of the arguments are of incorrect type.
    """
    self.encoder: Dict[str, int] = {}
    super().__init__(
        bos_token=bos_token,
        eos_token=eos_token,
        line_token=line_token,
        space_token=space_token,
        unk_token=unk_token,
        mask_token=mask_token,
        pad_token=pad_token,
        padding_side=padding_side,
        **kwargs,
    )

    with open(vocab_file, "r", encoding="utf-8") as reader:
        for token in reader.readlines():
            token = token.rstrip("\n")
            if len(token) == 0:
                continue
            self.encoder[token] = len(self.encoder)

    self.encoder[" "] = self.encoder["</_>"]
    self.encoder["\n"] = self.encoder["</n>"]
    del self.encoder["</_>"]
    del self.encoder["</n>"]

    self.decoder = {v: k for k, v in self.encoder.items()}

    self._max_word_len = max(len(x) for x in self.encoder.keys())
    self.cpmbee_special_tokens = {k: v for k, v in self.encoder.items() if k.startswith("<") and k.endswith(">")}

    self.ext_table: Dict[int, str] = {}
    self.ext_table_rev: Dict[str, int] = {}

    self.token_id_table: Dict[str, Dict[int, int]] = {}
    self.ext_special_tokens = []

    self.ext_args_for_model = [
        "input_id_subs",
        "input_pos",
        "context",
        "segment_ids",
        "segment_rel_offset",
        "segment_rel",
        "sample_ids",
        "num_segments",
        "predict_segments",
        "answer_placeholders",
        "ext_table",
        "token_id_table",
    ]

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.len()` ¶

Size of the full vocabulary with the added tokens.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def __len__(self):
    """
    Size of the full vocabulary with the added tokens.
    """
    return self.vocab_size + len(self.added_tokens_encoder)

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.check(token)` ¶

Checks if a token is present in the encoder.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeTokenizer class. TYPE: `CpmBeeTokenizer`
`token`	The token to be checked in the encoder. TYPE: `Any`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def check(self, token):
    """
    Checks if a token is present in the encoder.

    Args:
        self (CpmBeeTokenizer): An instance of the CpmBeeTokenizer class.
        token (Any): The token to be checked in the encoder.

    Returns:
        None.

    Raises:
        None.
    """
    return token in self.encoder

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.convert_data_to_id(data, prev_ext_states=None, shuffle_answer=True, max_depth=8)` ¶

Parse a dict to data ids. Exclusive for CPMBee. It will

parse the dict to segments and get segment_rel, which for calculating of position_bias.
tokenize every segment.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def convert_data_to_id(
    self,
    data: Any,
    prev_ext_states: Optional[_PrevExtTableStates] = None,
    shuffle_answer: bool = True,
    max_depth: int = 8,
):
    """
    Parse a dict to data ids. Exclusive for CPMBee. It will

    1. parse the dict to segments and get segment_rel, which for calculating of position_bias.
    2. tokenize every segment.
    """
    root: _DictTree = {
        "value": "<root>",
        "children": [],
        "depth": 0,
        "segment_id": 0,
        "need_predict": False,
    }

    segments = [root]

    def _build_dict_tree(data: CPMBeeInputType, depth: int, need_predict: bool) -> List[_DictTree]:
        if isinstance(data, dict):
            ret_list: List[_DictTree] = []
            curr_items = list(data.items())
            if need_predict and shuffle_answer:
                access_idx = np.arange(len(curr_items))
                np.random.shuffle(access_idx)
                curr_items = [curr_items[idx] for idx in access_idx]
            for k, v in curr_items:
                child_info: _DictTree = {
                    "value": k,
                    "children": [],
                    "depth": depth,
                    "segment_id": len(segments),
                    "need_predict": False,  # only leaves are contexts
                }
                segments.append(child_info)
                child_info["children"] = _build_dict_tree(
                    v, depth + 1, need_predict or (depth == 1 and k == "<ans>")
                )  # elements in <root>.<ans>

                ret_list.append(child_info)
            return ret_list
        else:
            assert isinstance(data, str), "Invalid data {}".format(data)
            ret: _DictTree = {
                "value": data,
                "children": [],
                "depth": depth,
                "segment_id": len(segments),
                "need_predict": need_predict,
            }
            segments.append(ret)
            return [ret]

    root["children"] = _build_dict_tree(data, 1, False)

    num_segments = len(segments)
    segment_rel = np.zeros((num_segments * num_segments,), dtype=np.int32)

    def _build_segment_rel(node: _DictTree) -> List[Tuple[int, int]]:
        ret: List[Tuple[int, int]] = [(node["segment_id"], node["depth"])]
        for child in node["children"]:
            sub = _build_segment_rel(child)
            for seg_id_1, depth_1 in sub:
                for seg_id_2, depth_2 in ret:
                    n_up = min(depth_1 - node["depth"], max_depth - 1)
                    n_down = min(depth_2 - node["depth"], max_depth - 1)
                    segment_rel[seg_id_1 * num_segments + seg_id_2] = rel_to_bucket(
                        n_up, n_down, max_depth=max_depth
                    )
                    segment_rel[seg_id_2 * num_segments + seg_id_1] = rel_to_bucket(
                        n_down, n_up, max_depth=max_depth
                    )
            ret.extend(sub)
        return ret

    _build_segment_rel(root)

    input_ids: List[int] = []
    input_id_subs: List[int] = []
    segment_bound: List[Tuple[int, int]] = []

    if prev_ext_states is not None:
        self.ext_table = prev_ext_states["ext_table"]
        self.token_id_table = prev_ext_states["token_id_table"]

    for seg in segments:
        # tokenize
        tokens = self.convert_tokens_to_ids(self.tokenize(seg["value"], for_cpmbee=True))

        token_id_subs = []
        reid_token_ids = []
        for idx in tokens:
            if idx in self.ext_table:
                # unk or special token
                token = self.ext_table[idx]
                if token.startswith("<") and token.endswith(">"):
                    # special token
                    if "_" in token:
                        token_name = token[1:-1].split("_", maxsplit=1)[0]
                    else:
                        token_name = token[1:-1]
                    token_name = "<{}>".format(token_name)
                else:
                    token_name = "<unk>"

                if token_name not in self.token_id_table:
                    self.token_id_table[token_name] = {}
                if idx not in self.token_id_table[token_name]:
                    self.token_id_table[token_name][idx] = len(self.token_id_table[token_name])
                if token_name not in self.encoder:
                    raise ValueError("Invalid token {}".format(token))
                reid_token_ids.append(self.encoder[token_name])
                token_id_subs.append(self.token_id_table[token_name][idx])
            else:
                reid_token_ids.append(idx)
                token_id_subs.append(0)
        tokens = [self.bos_token_id] + reid_token_ids
        token_id_subs = [0] + token_id_subs
        # eos_id 表示 no need_predict
        if not seg["need_predict"]:  # eos
            tokens = tokens + [self.eos_token_id]
            token_id_subs = token_id_subs + [0]
        else:
            # no eos
            pass
        begin = len(input_ids)
        input_ids.extend(tokens)
        input_id_subs.extend(token_id_subs)
        end = len(input_ids)
        segment_bound.append((begin, end))

    ids = np.array(input_ids, dtype=np.int32)
    id_subs = np.array(input_id_subs, dtype=np.int32)
    segs = np.zeros((ids.shape[0],), dtype=np.int32)  # 按segment_bound对seg编号
    context = np.zeros((ids.shape[0],), dtype=np.int8)
    for i, (begin, end) in enumerate(segment_bound):
        if not segments[i]["need_predict"]:
            context[begin:end] = 1
        segs[begin:end] = i

    curr_ext_table_states: _PrevExtTableStates = {
        "ext_table": self.ext_table,
        "token_id_table": self.token_id_table,
    }
    return ids, id_subs, context, segs, segment_rel, num_segments, curr_ext_table_states

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.convert_tokens_to_string(tokens)` ¶

Converts a list of tokens into a single string.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeTokenizer class. TYPE: `CpmBeeTokenizer`
`tokens`	A list of tokens to be converted into a string. TYPE: `List[str]`

RETURNS	DESCRIPTION
`str`	A string representation of the tokens. TYPE: `str`

This method takes in two parameters, self and tokens. The self parameter is an instance of the CpmBeeTokenizer class and is used to access the class's attributes and methods. The tokens parameter is a list of strings representing individual tokens.

The function returns a string that is obtained by concatenating all the tokens together using the ''.join() method. This method does not modify the original list of tokens.

No exceptions are raised by this method.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def convert_tokens_to_string(self, tokens: List[str]) -> str:
    """
    Converts a list of tokens into a single string.

    Args:
        self (CpmBeeTokenizer): An instance of the CpmBeeTokenizer class.
        tokens (List[str]): A list of tokens to be converted into a string.

    Returns:
        str: A string representation of the tokens.

    Raises:
        None.

    This method takes in two parameters, self and tokens. The self parameter is an instance of the CpmBeeTokenizer
    class and is used to access the class's attributes and methods. The tokens parameter is a
    list of strings representing individual tokens.

    The function returns a string that is obtained by concatenating all the tokens together using the ''.join() method.
    This method does not modify the original list of tokens.

    No exceptions are raised by this method.
    """
    return "".join(tokens)

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.get_piece(text)` ¶

Match with maximum length.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def get_piece(self, text: str) -> str:
    """
    Match with maximum length.
    """
    len_text = len(text)
    for i in range(len(text)):
        sub = text[: len_text - i]
        if (sub in self.encoder) or (sub in self.added_tokens_encoder):
            return sub
    return text[0]

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.get_vocab()` ¶

Get the vocabulary of the CpmBeeTokenizer instance.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeTokenizer class. This parameter represents the current instance of the tokenizer. TYPE: `CpmBeeTokenizer`

RETURNS	DESCRIPTION
`dict`	A dictionary containing the combined encoder and added tokens encoder. The keys represent tokens, and the values represent their corresponding IDs.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def get_vocab(self):
    """
    Get the vocabulary of the CpmBeeTokenizer instance.

    Args:
        self (CpmBeeTokenizer): The instance of the CpmBeeTokenizer class.
            This parameter represents the current instance of the tokenizer.

    Returns:
        dict: A dictionary containing the combined encoder and added tokens encoder.
            The keys represent tokens, and the values represent their corresponding IDs.

    Raises:
        None.
    """
    return dict(self.encoder, **self.added_tokens_encoder)

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.prepare_for_finetune(data_list, max_length=2048)` ¶

Prepares the input data for fine-tuning.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeTokenizer class. TYPE: `CpmBeeTokenizer`
`data_list`	A list of dictionaries containing the input data. TYPE: `List[Dict]`
`max_length`	The maximum length of the input data. Defaults to 2048. TYPE: `int` DEFAULT: `2048`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def prepare_for_finetune(
    self,
    data_list: List[Dict],
    max_length: int = 2048
):
    """
    Prepares the input data for fine-tuning.

    Args:
        self (CpmBeeTokenizer): The instance of the CpmBeeTokenizer class.
        data_list (List[Dict]): A list of dictionaries containing the input data.
        max_length (int, optional): The maximum length of the input data. Defaults to 2048.

    Returns:
        None.

    Raises:
        None.
    """
    _inputs: List[NDArray[np.int32]] = []
    _inputs_sub: List[NDArray[np.int32]] = []
    _context: List[NDArray[np.int8]] = []
    _sample_ids: List[NDArray[np.int32]] = []
    _segments: List[NDArray[np.int32]] = []
    _num_segments: List[NDArray[np.int32]] = []
    _segment_rel_offset: List[NDArray[np.int32]] = []
    _segment_rel: List[NDArray[np.int32]] = []
    _spans: List[List[int]] = []
    _raw_data: List[List[Any]] = []

    raw_data = {}
    for data in data_list:
        (
            input_ids,
            input_id_subs,
            context,
            segment_ids,
            segment_rel,
            n_segments,
            _
        ) = self.convert_data_to_id(data)

        input_ids = input_ids[: max_length]
        context = context[: max_length]
        segment_ids = segment_ids[: max_length]
        raw_data["input"] = data
        raw_data["samples"] = []

        sample_ids = np.zeros(input_ids.shape, dtype=np.int32)
        segment_rel_offset = np.zeros(input_ids.shape, dtype=np.int32)
        num_segments = np.full(input_ids.shape, n_segments, dtype=np.int32)

        _inputs.append(input_ids)
        _inputs_sub.append(input_id_subs)
        _context.append(context)
        _sample_ids.append(sample_ids)
        _segments.append(segment_ids)
        _num_segments.append(num_segments)
        _segment_rel_offset.append(segment_rel_offset)
        _segment_rel.append(segment_rel)
        _spans.append([input_ids.shape[0]])
        _raw_data.append([raw_data])

    batch_size = len(_inputs)
    inputs = np.zeros((batch_size, max_length), dtype=np.int32)
    inputs_sub = np.zeros((batch_size, max_length), dtype=np.int32)
    context = np.zeros((batch_size, max_length), dtype=np.int8)
    sample_ids = np.zeros((batch_size, max_length), dtype=np.int32)
    segments = np.zeros((batch_size, max_length), dtype=np.int32)
    num_segments = np.zeros((batch_size, max_length), dtype=np.int32)
    segment_rel_offset = np.zeros((batch_size, max_length), dtype=np.int32)
    tgt = np.full((batch_size, max_length), -100, dtype=np.int32)

    max_rel = 0
    for i in range(batch_size):
        max_rel = max(max_rel, _segment_rel[i].shape[0])
    segment_rel = np.zeros((batch_size, max_rel), dtype=np.int32)
    spans = np.zeros((batch_size, max_length), dtype=np.int32)
    length = np.zeros((batch_size,), dtype=np.int32)

    batch_ext_table_map: Dict[Tuple[int, int], int] = {}
    batch_ext_table_ids: List[int] = []
    batch_ext_table_sub: List[int] = []
    raw_data_list: List[Any] = []

    for i in range(batch_size):
        instance_length = _inputs[i].shape[0]
        rel_size = _segment_rel[i].shape[0]
        inputs[i, :instance_length] = _inputs[i]
        inputs_sub[i, :instance_length] = _inputs_sub[i]
        context[i, :instance_length] = _context[i]
        sample_ids[i, :instance_length] = _sample_ids[i]
        segments[i, :instance_length] = _segments[i]
        num_segments[i, :instance_length] = _num_segments[i]
        segment_rel_offset[i, :instance_length] = _segment_rel_offset[i]
        segment_rel[i, :rel_size] = _segment_rel[i]

        span_begin = 0
        for span_id, span_end in enumerate(_spans[i]):
            spans[i, span_begin:span_end] = span_id
            span_begin = span_end
        length[i] = instance_length
        raw_data_list.extend(_raw_data[i])

        for j in range(instance_length):
            idx, idx_sub = _inputs[i][j], _inputs_sub[i][j]
            tgt_idx = idx
            if idx_sub > 0:
                # need to be in ext table
                if (idx, idx_sub) not in batch_ext_table_map:
                    batch_ext_table_map[(idx, idx_sub)] = len(batch_ext_table_map)
                    batch_ext_table_ids.append(idx)
                    batch_ext_table_sub.append(idx_sub)
                tgt_idx = batch_ext_table_map[(idx, idx_sub)] + self.vocab_size
            if j > 1 and context[i, j - 1] == 0:
                if idx != self.bos_token_id:
                    tgt[i, j - 1] = tgt_idx
                else:
                    tgt[i, j - 1] = self.eos_token_id
        if context[i, instance_length - 1] == 0:
            tgt[i, instance_length - 1] = self.eos_token_id

    if len(batch_ext_table_map) == 0:
        # placeholder
        batch_ext_table_ids.append(0)
        batch_ext_table_sub.append(1)

    return BatchEncoding({
        "input_ids": inputs,
        "input_id_sub": inputs_sub,
        "length": length,
        "context": context > 0,
        "sample_ids": sample_ids,
        "num_segments": num_segments,
        "segment": segments,
        "segment_rel_offset": segment_rel_offset,
        "segment_rel": segment_rel,
        "span": spans,
        "labels": tgt,
        "ext_table_ids": np.array(batch_ext_table_ids, dtype=np.int32),
        "ext_table_sub": np.array(batch_ext_table_sub, dtype=np.int32)
    }, tensor_type="ms")

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.prepare_for_model(ids, pair_ids=None, add_special_tokens=True, padding=False, truncation=None, max_length=None, stride=0, pad_to_multiple_of=None, return_tensors=None, return_token_type_ids=None, return_attention_mask=None, return_overflowing_tokens=False, return_special_tokens_mask=False, return_length=False, verbose=True, prepend_batch_axis=False, **kwargs)` ¶

Prepares a sequence of input id, or a pair of sequences of inputs ids so that it can be used by the model. It adds special tokens, truncates sequences if overflowing while taking into account the special tokens and manages a moving window (with user defined stride) for overflowing tokens. Please Note, for pair_ids different than None and truncation_strategy = longest_first or True, it is not possible to return overflowing tokens. Such a combination of arguments will raise an error.

PARAMETER	DESCRIPTION
`ids`	Tokenized input ids of the first sequence. Can be obtained from a string by chaining the `tokenize` and `convert_tokens_to_ids` methods. TYPE: `List[int]`
`pair_ids`	Tokenized input ids of the second sequence. Can be obtained from a string by chaining the `tokenize` and `convert_tokens_to_ids` methods. TYPE: `List[int]`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def prepare_for_model(
    self,
    ids: List[int],
    pair_ids: Optional[List[int]] = None,
    add_special_tokens: bool = True,
    padding: Union[bool, str, PaddingStrategy] = False,
    truncation: Union[bool, str, TruncationStrategy] = None,
    max_length: Optional[int] = None,
    stride: int = 0,
    pad_to_multiple_of: Optional[int] = None,
    return_tensors: Optional[Union[str, TensorType]] = None,
    return_token_type_ids: Optional[bool] = None,
    return_attention_mask: Optional[bool] = None,
    return_overflowing_tokens: bool = False,
    return_special_tokens_mask: bool = False,
    return_length: bool = False,
    verbose: bool = True,
    prepend_batch_axis: bool = False,
    **kwargs,
) -> BatchEncoding:
    """
    Prepares a sequence of input id, or a pair of sequences of inputs ids so that it can be used by the model. It
    adds special tokens, truncates sequences if overflowing while taking into account the special tokens and
    manages a moving window (with user defined stride) for overflowing tokens. Please Note, for *pair_ids*
    different than `None` and *truncation_strategy = longest_first* or `True`, it is not possible to return
    overflowing tokens. Such a combination of arguments will raise an error.

    Args:
        ids (`List[int]`):
            Tokenized input ids of the first sequence. Can be obtained from a string by chaining the `tokenize` and
            `convert_tokens_to_ids` methods.
        pair_ids (`List[int]`, *optional*):
            Tokenized input ids of the second sequence. Can be obtained from a string by chaining the `tokenize`
            and `convert_tokens_to_ids` methods.
    """
    # Backward compatibility for 'truncation_strategy', 'pad_to_max_length'
    padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
        padding=padding,
        truncation=truncation,
        max_length=max_length,
        pad_to_multiple_of=pad_to_multiple_of,
        verbose=verbose,
        **kwargs,
    )

    pair = bool(pair_ids is not None)
    len_ids = len(ids)
    len_pair_ids = len(pair_ids) if pair else 0

    if return_token_type_ids and not add_special_tokens:
        raise ValueError(
            "Asking to return token_type_ids while setting add_special_tokens to False "
            "results in an undefined behavior. Please set add_special_tokens to True or "
            "set return_token_type_ids to None."
        )

    if (
        return_overflowing_tokens
        and truncation_strategy == TruncationStrategy.LONGEST_FIRST
        and pair_ids is not None
    ):
        raise ValueError(
            "Not possible to return overflowing tokens for pair of sequences with the "
            "`longest_first`. Please select another truncation strategy than `longest_first`, "
            "for instance `only_second` or `only_first`."
        )

    # Load from model defaults
    if return_token_type_ids is None:
        return_token_type_ids = "token_type_ids" in self.model_input_names
    if return_attention_mask is None:
        return_attention_mask = "attention_mask" in self.model_input_names

    encoded_inputs = {}

    # Compute the total size of the returned encodings
    total_len = len_ids + len_pair_ids + (self.num_special_tokens_to_add(pair=pair) if add_special_tokens else 0)

    # Truncation: Handle max sequence length
    overflowing_tokens = []
    if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE and max_length and total_len > max_length:
        ids, pair_ids, overflowing_tokens = self.truncate_sequences(
            ids,
            pair_ids=pair_ids,
            num_tokens_to_remove=total_len - max_length,
            truncation_strategy=truncation_strategy,
            stride=stride,
        )

    if return_overflowing_tokens:
        encoded_inputs["overflowing_tokens"] = overflowing_tokens
        encoded_inputs["num_truncated_tokens"] = total_len - max_length

    # Add special tokens
    if add_special_tokens:
        sequence = self.build_inputs_with_special_tokens(ids, pair_ids)
        token_type_ids = self.create_token_type_ids_from_sequences(ids, pair_ids)
    else:
        sequence = ids + pair_ids if pair else ids
        token_type_ids = [0] * len(ids) + ([0] * len(pair_ids) if pair else [])

    # Build output dictionary
    encoded_inputs["input_ids"] = sequence
    if return_token_type_ids:
        encoded_inputs["token_type_ids"] = token_type_ids
    if return_special_tokens_mask:
        if add_special_tokens:
            encoded_inputs["special_tokens_mask"] = self.get_special_tokens_mask(ids, pair_ids)
        else:
            encoded_inputs["special_tokens_mask"] = [0] * len(sequence)

    # Check lengths
    self._eventual_warn_about_too_long_sequence(encoded_inputs["input_ids"], max_length, verbose)

    # Padding
    if padding_strategy != PaddingStrategy.DO_NOT_PAD or return_attention_mask:
        encoded_inputs = self.pad(
            encoded_inputs,
            max_length=max_length,
            padding=padding_strategy.value,
            pad_to_multiple_of=pad_to_multiple_of,
            return_attention_mask=return_attention_mask,
        )

    if return_length:
        encoded_inputs["length"] = len(encoded_inputs["input_ids"])

    # for CPMBee, encode all the model arguments
    for arg in self.ext_args_for_model:
        v = kwargs.get(arg, None)
        if v is not None:
            encoded_inputs[arg] = v

    batch_outputs = BatchEncoding(
        encoded_inputs, tensor_type=return_tensors, prepend_batch_axis=prepend_batch_axis
    )

    return batch_outputs

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.save_vocabulary(save_directory, filename_prefix=None)` ¶

Save the vocabulary to a file.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeTokenizer class. TYPE: `CpmBeeTokenizer`
`save_directory`	The directory where the vocabulary file will be saved. TYPE: `str`
`filename_prefix`	An optional prefix to prepend to the filename. Default is None. TYPE: `Optional[str]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tuple[str]`	Tuple[str]: A tuple containing the path to the saved vocabulary file.

RAISES	DESCRIPTION
`IOError`	If there is an issue with reading or writing the vocabulary file.
`ValueError`	If the provided save_directory is not a valid directory.
`KeyError`	If any of the keys used for encoding tokens are not found in the encoder dictionary.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary to a file.

    Args:
        self (CpmBeeTokenizer): The instance of the CpmBeeTokenizer class.
        save_directory (str): The directory where the vocabulary file will be saved.
        filename_prefix (Optional[str]): An optional prefix to prepend to the filename. Default is None.

    Returns:
        Tuple[str]: A tuple containing the path to the saved vocabulary file.

    Raises:
        IOError: If there is an issue with reading or writing the vocabulary file.
        ValueError: If the provided save_directory is not a valid directory.
        KeyError: If any of the keys used for encoding tokens are not found in the encoder dictionary.
    """
    if os.path.isdir(save_directory):
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
    else:
        vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
    index = 0
    self.encoder["</n>"] = self.encoder["\n"]
    del self.encoder["\n"]
    self.encoder["</_>"] = self.encoder[" "]
    del self.encoder[" "]
    with open(vocab_file, "w", encoding="utf-8") as writer:
        for token, token_index in sorted(self.encoder.items(), key=lambda x: x[1]):
            if index != token_index:
                logger.warning(
                    f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                    " Please check that the vocabulary is not corrupted!"
                )
                index = token_index
            writer.write(token + "\n")
            index += 1
    return (vocab_file,)

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.tokenize(text, **kwargs)` ¶

Override the tokenize to meet the needs of CPMBee:

Mark the special token with < and >. The <> will be ignored.
Split sentences by the marked special tokens.
Record the marked special token by ext_table and ext_table_rev.
Tokenize the sentence without special tokens.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def tokenize(self, text: TextInput, **kwargs) -> List[str]:
    r"""
    Override the `tokenize` to meet the needs of CPMBee:

    1. Mark the special token with `<` and `>`. The `<>` will be ignored.
    2. Split sentences by the marked special tokens.
    3. Record the marked special token by `ext_table` and `ext_table_rev`.
    4. Tokenize the sentence without special tokens.
    """
    for_cpmbee = kwargs.get("for_cpmbee", False)
    all_special_tokens_extended = {
        str(t): t for t in self.all_special_tokens_extended if isinstance(t, AddedToken)
    }

    sentence_split = [""]
    is_special_token = False
    for i, c in enumerate(text):
        if is_special_token:
            if c == "<":
                tail = sentence_split.pop(-1)
                sentence_split[-1] += tail
                sentence_split.append(c)
                is_special_token = False
            elif c == ">":
                # end of special token
                sentence_split[-1] += c
                if sentence_split[-1] == "<>":
                    continue
                is_special_token = False
                sentence_split.append("")
            else:
                sentence_split[-1] += c
        else:
            if c == "<":
                is_special_token = True
                sentence_split.append(c)
            else:
                sentence_split[-1] += c
    if is_special_token:
        tail = sentence_split.pop(-1)
        sentence_split[-1] += tail

    output_tokens = []
    for i, part in enumerate(sentence_split):
        if (i & 1) == 1:
            # special token
            output_tokens.append(part)
            if for_cpmbee and (part not in self.encoder) and (part not in self.ext_table_rev):
                self.ext_table_rev[part] = len(self.ext_table_rev) + self.vocab_size
                self.ext_table[self.ext_table_rev[part]] = part
        else:
            output_tokens.extend(self._tokenize(part, for_cpmbee=for_cpmbee))

    # drop spaces
    for i, token in enumerate(output_tokens):
        if token in self.added_tokens_encoder:
            token = all_special_tokens_extended.get(token, None)
            left = output_tokens[i - 1] if i > 0 else None
            right = output_tokens[i + 1] if i < len(output_tokens) - 1 else None
            if isinstance(token, AddedToken):
                if token.rstrip and right:
                    # A bit counter-intuitive but we strip the left of the string
                    # since tok_extended.rstrip means the special token is eating all white spaces on its right
                    output_tokens[i + 1] = right.lstrip()
                # Strip white spaces on the left
                if token.lstrip and left:
                    output_tokens[i - 1] = left.rstrip()  # Opposite here
            else:
                if right:
                    output_tokens[i + 1] = right.lstrip()
                if left:
                    output_tokens[i - 1] = left.rstrip()

    skipped_tokens = []
    for token in output_tokens:
        if not token:
            continue
        skipped_tokens.append(token)

    return skipped_tokens

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.rel_to_bucket(n_up, n_down, max_depth=8)` ¶

Calculates the relative position of an item in a bucket based on the number of items above and below it.

PARAMETER	DESCRIPTION
`n_up`	The number of items above the item. TYPE: `int`
`n_down`	The number of items below the item. TYPE: `int`
`max_depth`	The maximum depth of the bucket. Defaults to 8. TYPE: `int` DEFAULT: `8`

RETURNS	DESCRIPTION
`int`	The relative position of the item in the bucket.

Source code in mindnlp\transformers\models\cpmbee\tokenization_cpmbee.py

def rel_to_bucket(n_up: int, n_down: int, max_depth: int = 8):
    """
    Calculates the relative position of an item in a bucket based on the number of items above and below it.

    Args:
        n_up (int): The number of items above the item.
        n_down (int): The number of items below the item.
        max_depth (int, optional): The maximum depth of the bucket. Defaults to 8.

    Returns:
        int: The relative position of the item in the bucket.

    Raises:
        None.

    """
    ret = n_up * max_depth + n_down
    if ret == 0:
        return ret
    else:
        # bucket 1 is reserved for incontext samples
        return ret + 1

`mindnlp.transformers.models.cpmbee.modeling_cpmbee` ¶

MindSpore CpmBee model.

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention` ¶

Bases: Module

This class represents the attention mechanism used in the CpmBee model. It inherits from the nn.Module class.

ATTRIBUTE	DESCRIPTION
`dim_model`	The hidden size of the model. TYPE: `int`
`num_heads`	The number of attention heads. TYPE: `int`
`dim_head`	The dimension of each attention head. TYPE: `int`
`project_q`	Linear layer for projecting the query. TYPE: `CpmBeeLinear`
`project_k`	Linear layer for projecting the key. TYPE: `CpmBeeLinear`
`project_v`	Linear layer for projecting the value. TYPE: `CpmBeeLinear`
`attention_out`	Linear layer for the output of the attention mechanism. TYPE: `CpmBeeLinear`
`softmax`	Softmax function for computing attention weights. TYPE: `Softmax`
`dropout`	Dropout layer for regularization (optional). TYPE: `Dropout or None`

METHOD	DESCRIPTION
`__init__`	Initializes the CpmBeeAttention class.
`forward`	Constructs the attention mechanism.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeAttention(nn.Module):

    """
    This class represents the attention mechanism used in the CpmBee model. It inherits from the nn.Module class.

    Attributes:
        dim_model (int): The hidden size of the model.
        num_heads (int): The number of attention heads.
        dim_head (int): The dimension of each attention head.
        project_q (CpmBeeLinear): Linear layer for projecting the query.
        project_k (CpmBeeLinear): Linear layer for projecting the key.
        project_v (CpmBeeLinear): Linear layer for projecting the value.
        attention_out (CpmBeeLinear): Linear layer for the output of the attention mechanism.
        softmax (nn.Softmax): Softmax function for computing attention weights.
        dropout (nn.Dropout or None): Dropout layer for regularization (optional).

    Methods:
        __init__:
            Initializes the CpmBeeAttention class.

        forward:
            Constructs the attention mechanism.
    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes an instance of the CpmBeeAttention class.

        Args:
            self: The instance of the class.
            config (CpmBeeConfig):
                The configuration object containing the following attributes:

                - hidden_size (int): The dimension of the model.
                - num_attention_heads (int): The number of attention heads.
                - dim_head (int): The dimension of each attention head.
                - ms_dtype: The data type used for the linear layers.
                - dropout_p (float, optional): The probability of an element to be zeroed during dropout.
                If not provided, no dropout is applied.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.dim_model = config.hidden_size
        self.num_heads = config.num_attention_heads
        self.dim_head = config.dim_head

        self.project_q = CpmBeeLinear(self.dim_model, self.num_heads * self.dim_head, dtype=config.ms_dtype)
        self.project_k = CpmBeeLinear(self.dim_model, self.num_heads * self.dim_head, dtype=config.ms_dtype)
        self.project_v = CpmBeeLinear(self.dim_model, self.num_heads * self.dim_head, dtype=config.ms_dtype)

        self.attention_out = CpmBeeLinear(self.num_heads * self.dim_head, self.dim_model, dtype=config.ms_dtype)

        self.softmax = nn.Softmax(dim=-1)

        if config.dropout_p is not None:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_q: mindspore.Tensor,
        hidden_kv: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: mindspore.Tensor,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_q (`mindspore.Tensor`):
                Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
            hidden_kv (`mindspore.Tensor` of shape `(batch, len_k, dim_model)`)):
                Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
            attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Avoid invalid areas to participate in the calculation of self-attention.
            position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Provide positional information to self-attention block.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional*):
                Cached past key and value projection states.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        batch_size = hidden_q.shape[0]
        len_q = hidden_q.shape[1]
        len_k = hidden_kv.shape[1]

        query = self.project_q(hidden_q)
        key = self.project_k(hidden_kv)
        value = self.project_v(hidden_kv)

        query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
        key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
        value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)

        if past_key_values is not None:
            key = ops.cat([past_key_values[0], key], dim=-2)
            value = ops.cat([past_key_values[1], value], dim=-2)
            len_k = key.shape[-2]

        # (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
        score = ops.matmul(query, key.swapaxes(-1, -2)) / math.sqrt(self.dim_head)
        score = score + position_bias

        score = ops.masked_fill(
            score,
            attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
            float("-inf"),
        )
        score = self.softmax(score)

        score = ops.masked_fill(
            score,
            attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
            0.,
        )
        if output_attentions:
            attn_weights = score
        else:
            attn_weights = None

        if self.dropout is not None:
            score = self.dropout(score)

        # (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
        score = ops.matmul(score, value)

        score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
        score = score.view(batch_size, len_q, self.num_heads * self.dim_head)

        score = self.attention_out(score)

        past_key_values = None
        if use_cache:
            past_key_values = (key, value)

        return score, attn_weights, past_key_values

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention.init(config)` ¶

Initializes an instance of the CpmBeeAttention class.

PARAMETER DESCRIPTION

self

The instance of the class.

config

The configuration object containing the following attributes:

hidden_size (int): The dimension of the model.
num_attention_heads (int): The number of attention heads.
dim_head (int): The dimension of each attention head.
ms_dtype: The data type used for the linear layers.
dropout_p (float, optional): The probability of an element to be zeroed during dropout. If not provided, no dropout is applied.

TYPE: CpmBeeConfig

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes an instance of the CpmBeeAttention class.

    Args:
        self: The instance of the class.
        config (CpmBeeConfig):
            The configuration object containing the following attributes:

            - hidden_size (int): The dimension of the model.
            - num_attention_heads (int): The number of attention heads.
            - dim_head (int): The dimension of each attention head.
            - ms_dtype: The data type used for the linear layers.
            - dropout_p (float, optional): The probability of an element to be zeroed during dropout.
            If not provided, no dropout is applied.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.dim_model = config.hidden_size
    self.num_heads = config.num_attention_heads
    self.dim_head = config.dim_head

    self.project_q = CpmBeeLinear(self.dim_model, self.num_heads * self.dim_head, dtype=config.ms_dtype)
    self.project_k = CpmBeeLinear(self.dim_model, self.num_heads * self.dim_head, dtype=config.ms_dtype)
    self.project_v = CpmBeeLinear(self.dim_model, self.num_heads * self.dim_head, dtype=config.ms_dtype)

    self.attention_out = CpmBeeLinear(self.num_heads * self.dim_head, self.dim_model, dtype=config.ms_dtype)

    self.softmax = nn.Softmax(dim=-1)

    if config.dropout_p is not None:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention.forward(hidden_q, hidden_kv, attention_mask, position_bias, output_attentions=False, past_key_values=None, use_cache=None)` ¶

PARAMETER	DESCRIPTION
`hidden_q`	Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences. TYPE: `mindspore.Tensor`
`hidden_kv`	Tensor key_value and query of shape `(batch, len_k, dim_model)` TYPE: `mindspore.Tensor` of shape `(batch, len_k, dim_model)`
`attention_mask`	Avoid invalid areas to participate in the calculation of self-attention. TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`
`position_bias`	Provide positional information to self-attention block. TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`
`output_attentions`	Whether or not to return the attentions tensors of all attention layers. TYPE: `bool`, optional DEFAULT: `False`
`past_key_values`	Cached past key and value projection states. TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor]`, optional DEFAULT: `None`
`use_cache`	If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). TYPE: `bool`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    hidden_q: mindspore.Tensor,
    hidden_kv: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: mindspore.Tensor,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_q (`mindspore.Tensor`):
            Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
        hidden_kv (`mindspore.Tensor` of shape `(batch, len_k, dim_model)`)):
            Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
        attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Avoid invalid areas to participate in the calculation of self-attention.
        position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Provide positional information to self-attention block.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional*):
            Cached past key and value projection states.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    batch_size = hidden_q.shape[0]
    len_q = hidden_q.shape[1]
    len_k = hidden_kv.shape[1]

    query = self.project_q(hidden_q)
    key = self.project_k(hidden_kv)
    value = self.project_v(hidden_kv)

    query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
    key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
    value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)

    if past_key_values is not None:
        key = ops.cat([past_key_values[0], key], dim=-2)
        value = ops.cat([past_key_values[1], value], dim=-2)
        len_k = key.shape[-2]

    # (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
    score = ops.matmul(query, key.swapaxes(-1, -2)) / math.sqrt(self.dim_head)
    score = score + position_bias

    score = ops.masked_fill(
        score,
        attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
        float("-inf"),
    )
    score = self.softmax(score)

    score = ops.masked_fill(
        score,
        attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.tensor(False),
        0.,
    )
    if output_attentions:
        attn_weights = score
    else:
        attn_weights = None

    if self.dropout is not None:
        score = self.dropout(score)

    # (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
    score = ops.matmul(score, value)

    score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
    score = score.view(batch_size, len_q, self.num_heads * self.dim_head)

    score = self.attention_out(score)

    past_key_values = None
    if use_cache:
        past_key_values = (key, value)

    return score, attn_weights, past_key_values

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamHypotheses` ¶

Bases: BeamHypotheses

This class represents a set of beam hypotheses for the CpmBee model. It is derived from the BeamHypotheses class.

The CpmBeeBeamHypotheses class is used to store and manage a list of beam hypotheses along with their scores and beam indices. Each hypothesis consists of a sequence of predicted tokens and a corresponding sum of log probabilities. The class provides methods to add new hypotheses, update the list of hypotheses, and retrieve the best hypotheses based on their scores.

ATTRIBUTE	DESCRIPTION
`beams`	A list of tuples representing the beam hypotheses. Each tuple contains the hypothesis score, the predicted token sequence, and the beam indices. TYPE: `List[Tuple[float, List, Optional[Tensor]]]`
`worst_score`	The score of the worst hypothesis in the list. TYPE: `float`
`num_beams`	The maximum number of beam hypotheses to be stored. TYPE: `int`
`length_penalty`	The length penalty factor applied to the hypothesis scores. TYPE: `float`

METHOD	DESCRIPTION
`add`	Add a new hypothesis to the list of beam hypotheses. The hypothesis is represented by a sequence of predicted tokens and its sum of log probabilities. Optionally, the beam indices can also be provided.
`update`	Update the list of beam hypotheses by removing the worst hypothesis if the maximum number of hypotheses is exceeded.
`get_best`	Retrieve the best `num_best` beam hypotheses based on their scores. The hypotheses are returned as a list of tuples, where each tuple contains the hypothesis score, the predicted token sequence, and the beam indices.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeBeamHypotheses(BeamHypotheses):

    """
    This class represents a set of beam hypotheses for the CpmBee model. It is derived from the BeamHypotheses class.

    The CpmBeeBeamHypotheses class is used to store and manage a list of beam hypotheses along with their scores
    and beam indices. Each hypothesis consists of a sequence of predicted tokens and a corresponding sum of log
    probabilities. The class provides methods to add new hypotheses, update the list of hypotheses, and retrieve
    the best hypotheses based on their scores.

    Attributes:
        beams (List[Tuple[float, List, Optional[mindspore.Tensor]]]): A list of tuples representing the beam hypotheses.
            Each tuple contains the hypothesis score, the predicted token sequence, and the beam indices.
        worst_score (float): The score of the worst hypothesis in the list.
        num_beams (int): The maximum number of beam hypotheses to be stored.
        length_penalty (float): The length penalty factor applied to the hypothesis scores.

    Methods:
        add:
            Add a new hypothesis to the list of beam hypotheses. The hypothesis is represented by a sequence of
            predicted tokens and its sum of log probabilities. Optionally, the beam indices can also be provided.

        update:
            Update the list of beam hypotheses by removing the worst hypothesis if the maximum number of hypotheses
            is exceeded.

        get_best:
            Retrieve the best `num_best` beam hypotheses based on their scores. The hypotheses are returned as a list
            of tuples, where each tuple contains the hypothesis score, the predicted token sequence, and the beam indices.
    """
    def add(self, hyp: List, sum_logprobs: float, beam_indices: Optional[mindspore.Tensor] = None):
        """
        Add a new hypothesis to the list.
        """
        score = sum_logprobs / (len(hyp) ** self.length_penalty)
        if len(self) < self.num_beams or score > self.worst_score:
            self.beams.append((score, hyp, beam_indices))
            if len(self) > self.num_beams:
                sorted_next_scores = sorted([(s, idx) for idx, (s, _, _) in enumerate(self.beams)])
                del self.beams[sorted_next_scores[0][1]]
                self.worst_score = sorted_next_scores[1][0]
            else:
                self.worst_score = min(score, self.worst_score)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamHypotheses.add(hyp, sum_logprobs, beam_indices=None)` ¶

Add a new hypothesis to the list.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def add(self, hyp: List, sum_logprobs: float, beam_indices: Optional[mindspore.Tensor] = None):
    """
    Add a new hypothesis to the list.
    """
    score = sum_logprobs / (len(hyp) ** self.length_penalty)
    if len(self) < self.num_beams or score > self.worst_score:
        self.beams.append((score, hyp, beam_indices))
        if len(self) > self.num_beams:
            sorted_next_scores = sorted([(s, idx) for idx, (s, _, _) in enumerate(self.beams)])
            del self.beams[sorted_next_scores[0][1]]
            self.worst_score = sorted_next_scores[1][0]
        else:
            self.worst_score = min(score, self.worst_score)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer` ¶

Bases: BeamSearchScorer

Override BeamSearchScorer for CPMBee to support:

Replace beam_tokens by beam_states, containing idx, ans, nx_token_id...
The process will update the beam_states
The finalize will just return the best hypotheses as a list.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeBeamSearchScorer(BeamSearchScorer):
    """
    Override BeamSearchScorer for CPMBee to support:

    1. Replace beam_tokens by beam_states, containing `idx`, `ans`, `nx_token_id`...
    2. The `process` will update the beam_states
    3. The `finalize` will just return the best hypotheses as a list.
    """
    def __init__(
        self,
        batch_size: int,
        num_beams: int,
        length_penalty: Optional[float] = 1.0,
        do_early_stopping: Optional[Union[bool, str]] = False,
        num_beam_hyps_to_keep: Optional[int] = 1,
        num_beam_groups: Optional[int] = 1,
        max_length: Optional[int] = None,
        **model_kwargs,
    ):
        """
        Initializes the CpmBeeBeamSearchScorer object.

        Args:
            batch_size (int): The batch size for beam search.
            num_beams (int): The number of beams for beam search.
            length_penalty (float, optional): The length penalty for beam search. Defaults to 1.0.
            do_early_stopping (bool or str, optional): Flag to indicate if early stopping should be performed.
                Defaults to False.
            num_beam_hyps_to_keep (int, optional): The number of beam hypotheses to keep. Defaults to 1.
            num_beam_groups (int, optional): The number of beam groups for beam search. Defaults to 1.
            max_length (int, optional): The maximum length for beam search. Defaults to None.
            **model_kwargs: Additional model-specific keyword arguments.

        Returns:
            None.

        Raises:
            ValueError: If the provided batch size, num_beams, num_beam_groups, or max_length is not a positive integer.
            TypeError: If the provided length_penalty is not a float or if do_early_stopping is not a bool or str.
            RuntimeError: If an error occurs during initialization.
        """
        super().__init__(batch_size, num_beams, length_penalty, do_early_stopping, num_beam_hyps_to_keep, num_beam_groups, max_length)
        self.num_beams = num_beams
        self.length_penalty = length_penalty
        self.do_early_stopping = do_early_stopping
        self.num_beam_hyps_to_keep = num_beam_hyps_to_keep
        self.num_beam_groups = num_beam_groups
        self.group_size = self.num_beams // self.num_beam_groups

        self._is_init = False
        self._beam_hyps = [
            CpmBeeBeamHypotheses(
                num_beams=self.num_beams,
                length_penalty=self.length_penalty,
                early_stopping=self.do_early_stopping,
                max_length=max_length,
            )
            for _ in range(batch_size)
        ]
        self._done = mindspore.tensor([False for _ in range(batch_size)], dtype=mindspore.bool_)

        self.beam_states = []
        for sent_id in range(batch_size):
            instance_beam_states = []

            for _ in range(self.num_beams):
                instance_beam_states.append(
                    {
                        "idx": 0,
                        "ans": [],
                        "nx_token_id": 6,
                        "nx_token_sub": 0,
                        "nx_segment_id": model_kwargs["other_info"][sent_id]["predict_segments"][0][0],
                        "nx_position": 0,
                    }
                )
            self.beam_states.append(instance_beam_states)

    def process(
        self,
        batch_size: int,
        cur_len: int,
        _next_scores: mindspore.Tensor,
        next_scores: mindspore.Tensor,
        next_tokens: mindspore.Tensor,
        vocab_size: Optional[int] = None,
        pad_token_id: Optional[int] = None,
        bos_token_id: Optional[int] = None,
        eos_token_id: Optional[Union[int, List[int]]] = None,
        max_length: Optional[int] = None,
        ext_table_sub_cpu: Optional[mindspore.Tensor] = None,
        ext_table_ids_cpu: Optional[mindspore.Tensor] = None,
        **model_kwargs,
    ) -> Tuple[mindspore.Tensor]:
        """
        Process the beam search for the CpmBeeBeamSearchScorer.

        Args:
            self: The instance of the CpmBeeBeamSearchScorer class.
            batch_size (int): The batch size for processing.
            cur_len (int): The current length of the sequence being processed.
            _next_scores (mindspore.Tensor): The scores for the next tokens.
            next_scores (mindspore.Tensor): The scores for the next tokens.
            next_tokens (mindspore.Tensor): The tokens for the next sequence.
            vocab_size (Optional[int]): The size of the vocabulary. Defaults to None.
            pad_token_id (Optional[int]): The token ID for padding. Defaults to None.
            bos_token_id (Optional[int]): The token ID for the beginning of sequence. Defaults to None.
            eos_token_id (Optional[Union[int, List[int]]]): The token ID for the end of sequence. Defaults to None.
            max_length (Optional[int]): The maximum length of the sequence. Defaults to None.
            ext_table_sub_cpu (Optional[mindspore.Tensor]): The CPU tensor for extended table sub.
            ext_table_ids_cpu (Optional[mindspore.Tensor]): The CPU tensor for extended table IDs.
            **model_kwargs: Additional keyword arguments for the model.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the next beam scores, next beam states, and next beam indices.

        Raises:
            AssertionError: If the length of next_instance_beam_states is not equal to zero when cur_len is equal to
                max_length, or not equal to self.num_beams otherwise.

        """
        next_beam_state = []
        for sent_id in range(batch_size):
            self._done[sent_id] = self._done[sent_id] or self._beam_hyps[sent_id].is_done(
                next_scores[sent_id].max().item(), cur_len
            )
            if self._done[sent_id]:
                next_beam_state.append(
                    [
                        (
                            {
                                "idx": 0,
                                "ans": [],
                                "nx_token_id": pad_token_id,
                                "nx_token_sub": 0,
                                "nx_segment_id": 0,
                                "nx_position": 0,
                            },
                            0,
                            0,
                        )
                    ]
                    * self.num_beams
                )
                continue

            next_instance_beam_states = []

            for idx, value in zip(next_tokens[sent_id], next_scores[sent_id]):
                beam_id = ops.div(idx, _next_scores.shape[-1], rounding_mode="floor").item()
                word_id = (idx % _next_scores.shape[-1]).item()

                curr_info = self.beam_states[sent_id][beam_id]
                if (
                    word_id == eos_token_id
                    and (curr_info["idx"] + 1 == len(model_kwargs["other_info"][sent_id]["predict_segments"]))
                ) or cur_len == max_length:
                    self._beam_hyps[sent_id].add(
                        self.beam_states[sent_id][beam_id]["ans"]
                        + [
                            (
                                word_id,
                                model_kwargs["other_info"][sent_id]["predict_segments"][curr_info["idx"]][1],
                            )
                        ],
                        value.item(),
                    )
                elif word_id == eos_token_id:
                    next_instance_beam_states.append(
                        (
                            {
                                "idx": curr_info["idx"] + 1,
                                "ans": curr_info["ans"]
                                + [
                                    (
                                        word_id,
                                        model_kwargs["other_info"][sent_id]["predict_segments"][curr_info["idx"]][1],
                                    )
                                ],
                                "nx_token_id": bos_token_id,
                                "nx_token_sub": 0,
                                "nx_segment_id": model_kwargs["other_info"][sent_id]["predict_segments"][
                                    curr_info["idx"] + 1
                                ][0],
                                "nx_position": 0,
                            },
                            value.item(),
                            sent_id * self.num_beams + beam_id,
                        )
                    )

                else:
                    raw_word_id = word_id
                    word_id_sub = 0
                    if word_id >= vocab_size:
                        word_id -= vocab_size
                        word_id_sub = int(ext_table_sub_cpu[word_id].item())
                        word_id = int(ext_table_ids_cpu[word_id].item())

                    next_instance_beam_states.append(
                        (
                            {
                                "idx": curr_info["idx"],
                                "ans": curr_info["ans"]
                                + [
                                    (
                                        raw_word_id,
                                        model_kwargs["other_info"][sent_id]["predict_segments"][curr_info["idx"]][1],
                                    )
                                ],
                                "nx_token_id": word_id,
                                "nx_token_sub": word_id_sub,
                                "nx_segment_id": curr_info["nx_segment_id"],
                                "nx_position": curr_info["nx_position"] + 1,
                            },
                            value.item(),
                            sent_id * self.num_beams + beam_id,
                        )
                    )

                if len(next_instance_beam_states) == self.num_beams:
                    break
            assert len(next_instance_beam_states) == 0 if cur_len == max_length else self.num_beams
            next_beam_state.append(next_instance_beam_states)

        if cur_len == max_length:
            return None

        beam_reorder_idx = []
        beam_new_scores = []
        beam_states = []
        for sent_id in range(batch_size):
            instance_beam_states = []
            for beam_id in range(self.num_beams):
                state, value, beam_idx = next_beam_state[sent_id][beam_id]
                beam_reorder_idx.append(beam_idx)
                beam_new_scores.append(value)
                instance_beam_states.append(state)
            beam_states.append(instance_beam_states)
        self.beam_states = beam_states

        return UserDict(
            {
                "next_beam_scores": mindspore.tensor(beam_new_scores).view(-1),
                "next_beam_states": beam_states,
                "next_beam_indices": mindspore.tensor(beam_reorder_idx, dtype=mindspore.int32).view(-1),
            }
        )

    def finalize(self) -> Tuple[mindspore.Tensor]:
        """
        Finalizes the beam search scoring process and returns the best hypotheses.

        Args:
            self: The instance of the CpmBeeBeamSearchScorer class.

        Returns:
            A tuple containing mindspore.Tensor objects representing the best hypotheses.

        Raises:
            None.

        This method iterates over the beam hypotheses generated during the beam search process and selects the
        best hypothesis from each beam. The best hypothesis is determined based on the maximum score assigned to it.
        The selected best hypotheses are then returned as a tuple of mindspore.Tensor objects.

        Note:
            - The beam hypotheses are internally stored in the _beam_hyps attribute of the CpmBeeBeamSearchScorer instance.
            - The best hypothesis is determined by selecting the hypothesis with the maximum score from each beam.

        Example:
            ```python
            >>> scorer = CpmBeeBeamSearchScorer()
            >>> results = scorer.finalize()
            >>> # results contains the best hypotheses as mindspore.Tensor objects.
            ```
        """
        results = []
        for _, hypotheses in enumerate(self._beam_hyps):
            best_hyp = max(hypotheses.beams, key=lambda x: x[0])[1]
            results.append(best_hyp)
        return results

    @staticmethod
    def apply_repetition_penalty(
        logits,
        batch_size,
        num_beams,
        prev_output_tokens,
        repetition_penalty,
        start_idx=None,
        end_idx=None,
        window_size=None,
    ):
        """
        Applies repetition penalty to the logits for beam search in the CpmBeeBeamSearchScorer class.

        Args:
            logits (Tensor): The logits representing the scores for each token in the vocabulary.
                Shape: (batch_size * num_beams, vocab_size).
            batch_size (int): The size of the batch.
            num_beams (int): The number of beams used in the beam search.
            prev_output_tokens (Tensor): The previously generated tokens. Shape: (batch_size * num_beams, sequence_length).
            repetition_penalty (float): The coefficient for the repetition penalty. Must be >= 1.
            start_idx (int, optional): The start index of the window for calculating repetition penalty. Defaults to None.
            end_idx (int, optional): The end index of the window for calculating repetition penalty. Defaults to None.
            window_size (int, optional): The size of the window for calculating repetition penalty. Defaults to None.

        Returns:
            None

        Raises:
            AssertionError: If repetition_penalty is less than 1.

        """
        # only conduct repetition penalty for the output
        assert repetition_penalty >= 1, "repetition penalty coefficient should >= 1"
        # repetition penalty (from CTRL paper https://arxiv.org/abs/1909.05858)
        for i in range(batch_size * num_beams):
            if start_idx is None or end_idx is None:
                output_tokens = prev_output_tokens[i].tolist()
            else:
                if end_idx >= start_idx:
                    if window_size:
                        output_tokens = prev_output_tokens[i][
                            max(start_idx, end_idx + 1 - window_size) : end_idx + 1
                        ].tolist()
                    else:
                        output_tokens = prev_output_tokens[i][start_idx : end_idx + 1].tolist()
                else:
                    output_tokens = []
            for previous_token in set(output_tokens):
                # if score < 0 then repetition penalty has to
                # multiplied to reduce the previous token probability
                if logits[i, previous_token] < 0:
                    logits[i, previous_token] *= repetition_penalty
                else:
                    logits[i, previous_token] /= repetition_penalty

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.init(batch_size, num_beams, length_penalty=1.0, do_early_stopping=False, num_beam_hyps_to_keep=1, num_beam_groups=1, max_length=None, **model_kwargs)` ¶

Initializes the CpmBeeBeamSearchScorer object.

PARAMETER	DESCRIPTION
`batch_size`	The batch size for beam search. TYPE: `int`
`num_beams`	The number of beams for beam search. TYPE: `int`
`length_penalty`	The length penalty for beam search. Defaults to 1.0. TYPE: `float` DEFAULT: `1.0`
`do_early_stopping`	Flag to indicate if early stopping should be performed. Defaults to False. TYPE: `bool or str` DEFAULT: `False`
`num_beam_hyps_to_keep`	The number of beam hypotheses to keep. Defaults to 1. TYPE: `int` DEFAULT: `1`
`num_beam_groups`	The number of beam groups for beam search. Defaults to 1. TYPE: `int` DEFAULT: `1`
`max_length`	The maximum length for beam search. Defaults to None. TYPE: `int` DEFAULT: `None`
`**model_kwargs`	Additional model-specific keyword arguments. DEFAULT: `{}`

RETURNS	DESCRIPTION
	None.

RAISES	DESCRIPTION
`ValueError`	If the provided batch size, num_beams, num_beam_groups, or max_length is not a positive integer.
`TypeError`	If the provided length_penalty is not a float or if do_early_stopping is not a bool or str.
`RuntimeError`	If an error occurs during initialization.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(
    self,
    batch_size: int,
    num_beams: int,
    length_penalty: Optional[float] = 1.0,
    do_early_stopping: Optional[Union[bool, str]] = False,
    num_beam_hyps_to_keep: Optional[int] = 1,
    num_beam_groups: Optional[int] = 1,
    max_length: Optional[int] = None,
    **model_kwargs,
):
    """
    Initializes the CpmBeeBeamSearchScorer object.

    Args:
        batch_size (int): The batch size for beam search.
        num_beams (int): The number of beams for beam search.
        length_penalty (float, optional): The length penalty for beam search. Defaults to 1.0.
        do_early_stopping (bool or str, optional): Flag to indicate if early stopping should be performed.
            Defaults to False.
        num_beam_hyps_to_keep (int, optional): The number of beam hypotheses to keep. Defaults to 1.
        num_beam_groups (int, optional): The number of beam groups for beam search. Defaults to 1.
        max_length (int, optional): The maximum length for beam search. Defaults to None.
        **model_kwargs: Additional model-specific keyword arguments.

    Returns:
        None.

    Raises:
        ValueError: If the provided batch size, num_beams, num_beam_groups, or max_length is not a positive integer.
        TypeError: If the provided length_penalty is not a float or if do_early_stopping is not a bool or str.
        RuntimeError: If an error occurs during initialization.
    """
    super().__init__(batch_size, num_beams, length_penalty, do_early_stopping, num_beam_hyps_to_keep, num_beam_groups, max_length)
    self.num_beams = num_beams
    self.length_penalty = length_penalty
    self.do_early_stopping = do_early_stopping
    self.num_beam_hyps_to_keep = num_beam_hyps_to_keep
    self.num_beam_groups = num_beam_groups
    self.group_size = self.num_beams // self.num_beam_groups

    self._is_init = False
    self._beam_hyps = [
        CpmBeeBeamHypotheses(
            num_beams=self.num_beams,
            length_penalty=self.length_penalty,
            early_stopping=self.do_early_stopping,
            max_length=max_length,
        )
        for _ in range(batch_size)
    ]
    self._done = mindspore.tensor([False for _ in range(batch_size)], dtype=mindspore.bool_)

    self.beam_states = []
    for sent_id in range(batch_size):
        instance_beam_states = []

        for _ in range(self.num_beams):
            instance_beam_states.append(
                {
                    "idx": 0,
                    "ans": [],
                    "nx_token_id": 6,
                    "nx_token_sub": 0,
                    "nx_segment_id": model_kwargs["other_info"][sent_id]["predict_segments"][0][0],
                    "nx_position": 0,
                }
            )
        self.beam_states.append(instance_beam_states)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.apply_repetition_penalty(logits, batch_size, num_beams, prev_output_tokens, repetition_penalty, start_idx=None, end_idx=None, window_size=None)` `staticmethod` ¶

Applies repetition penalty to the logits for beam search in the CpmBeeBeamSearchScorer class.

PARAMETER	DESCRIPTION
`logits`	The logits representing the scores for each token in the vocabulary. Shape: (batch_size * num_beams, vocab_size). TYPE: `Tensor`
`batch_size`	The size of the batch. TYPE: `int`
`num_beams`	The number of beams used in the beam search. TYPE: `int`
`prev_output_tokens`	The previously generated tokens. Shape: (batch_size * num_beams, sequence_length). TYPE: `Tensor`
`repetition_penalty`	The coefficient for the repetition penalty. Must be >= 1. TYPE: `float`
`start_idx`	The start index of the window for calculating repetition penalty. Defaults to None. TYPE: `int` DEFAULT: `None`
`end_idx`	The end index of the window for calculating repetition penalty. Defaults to None. TYPE: `int` DEFAULT: `None`
`window_size`	The size of the window for calculating repetition penalty. Defaults to None. TYPE: `int` DEFAULT: `None`

RETURNS	DESCRIPTION
	None

RAISES	DESCRIPTION
`AssertionError`	If repetition_penalty is less than 1.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

@staticmethod
def apply_repetition_penalty(
    logits,
    batch_size,
    num_beams,
    prev_output_tokens,
    repetition_penalty,
    start_idx=None,
    end_idx=None,
    window_size=None,
):
    """
    Applies repetition penalty to the logits for beam search in the CpmBeeBeamSearchScorer class.

    Args:
        logits (Tensor): The logits representing the scores for each token in the vocabulary.
            Shape: (batch_size * num_beams, vocab_size).
        batch_size (int): The size of the batch.
        num_beams (int): The number of beams used in the beam search.
        prev_output_tokens (Tensor): The previously generated tokens. Shape: (batch_size * num_beams, sequence_length).
        repetition_penalty (float): The coefficient for the repetition penalty. Must be >= 1.
        start_idx (int, optional): The start index of the window for calculating repetition penalty. Defaults to None.
        end_idx (int, optional): The end index of the window for calculating repetition penalty. Defaults to None.
        window_size (int, optional): The size of the window for calculating repetition penalty. Defaults to None.

    Returns:
        None

    Raises:
        AssertionError: If repetition_penalty is less than 1.

    """
    # only conduct repetition penalty for the output
    assert repetition_penalty >= 1, "repetition penalty coefficient should >= 1"
    # repetition penalty (from CTRL paper https://arxiv.org/abs/1909.05858)
    for i in range(batch_size * num_beams):
        if start_idx is None or end_idx is None:
            output_tokens = prev_output_tokens[i].tolist()
        else:
            if end_idx >= start_idx:
                if window_size:
                    output_tokens = prev_output_tokens[i][
                        max(start_idx, end_idx + 1 - window_size) : end_idx + 1
                    ].tolist()
                else:
                    output_tokens = prev_output_tokens[i][start_idx : end_idx + 1].tolist()
            else:
                output_tokens = []
        for previous_token in set(output_tokens):
            # if score < 0 then repetition penalty has to
            # multiplied to reduce the previous token probability
            if logits[i, previous_token] < 0:
                logits[i, previous_token] *= repetition_penalty
            else:
                logits[i, previous_token] /= repetition_penalty

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.finalize()` ¶

Finalizes the beam search scoring process and returns the best hypotheses.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeBeamSearchScorer class.

RETURNS	DESCRIPTION
`Tuple[Tensor]`	A tuple containing mindspore.Tensor objects representing the best hypotheses.

This method iterates over the beam hypotheses generated during the beam search process and selects the best hypothesis from each beam. The best hypothesis is determined based on the maximum score assigned to it. The selected best hypotheses are then returned as a tuple of mindspore.Tensor objects.

Note

The beam hypotheses are internally stored in the _beam_hyps attribute of the CpmBeeBeamSearchScorer instance.
The best hypothesis is determined by selecting the hypothesis with the maximum score from each beam.

Example

>>> scorer = CpmBeeBeamSearchScorer()
>>> results = scorer.finalize()
>>> # results contains the best hypotheses as mindspore.Tensor objects.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def finalize(self) -> Tuple[mindspore.Tensor]:
    """
    Finalizes the beam search scoring process and returns the best hypotheses.

    Args:
        self: The instance of the CpmBeeBeamSearchScorer class.

    Returns:
        A tuple containing mindspore.Tensor objects representing the best hypotheses.

    Raises:
        None.

    This method iterates over the beam hypotheses generated during the beam search process and selects the
    best hypothesis from each beam. The best hypothesis is determined based on the maximum score assigned to it.
    The selected best hypotheses are then returned as a tuple of mindspore.Tensor objects.

    Note:
        - The beam hypotheses are internally stored in the _beam_hyps attribute of the CpmBeeBeamSearchScorer instance.
        - The best hypothesis is determined by selecting the hypothesis with the maximum score from each beam.

    Example:
        ```python
        >>> scorer = CpmBeeBeamSearchScorer()
        >>> results = scorer.finalize()
        >>> # results contains the best hypotheses as mindspore.Tensor objects.
        ```
    """
    results = []
    for _, hypotheses in enumerate(self._beam_hyps):
        best_hyp = max(hypotheses.beams, key=lambda x: x[0])[1]
        results.append(best_hyp)
    return results

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.process(batch_size, cur_len, _next_scores, next_scores, next_tokens, vocab_size=None, pad_token_id=None, bos_token_id=None, eos_token_id=None, max_length=None, ext_table_sub_cpu=None, ext_table_ids_cpu=None, **model_kwargs)` ¶

Process the beam search for the CpmBeeBeamSearchScorer.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeBeamSearchScorer class.
`batch_size`	The batch size for processing. TYPE: `int`
`cur_len`	The current length of the sequence being processed. TYPE: `int`
`_next_scores`	The scores for the next tokens. TYPE: `Tensor`
`next_scores`	The scores for the next tokens. TYPE: `Tensor`
`next_tokens`	The tokens for the next sequence. TYPE: `Tensor`
`vocab_size`	The size of the vocabulary. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`pad_token_id`	The token ID for padding. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`bos_token_id`	The token ID for the beginning of sequence. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`eos_token_id`	The token ID for the end of sequence. Defaults to None. TYPE: `Optional[Union[int, List[int]]]` DEFAULT: `None`
`max_length`	The maximum length of the sequence. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`ext_table_sub_cpu`	The CPU tensor for extended table sub. TYPE: `Optional[Tensor]` DEFAULT: `None`
`ext_table_ids_cpu`	The CPU tensor for extended table IDs. TYPE: `Optional[Tensor]` DEFAULT: `None`
`**model_kwargs`	Additional keyword arguments for the model. DEFAULT: `{}`

RETURNS	DESCRIPTION
`Tuple[Tensor]`	Tuple[mindspore.Tensor]: A tuple containing the next beam scores, next beam states, and next beam indices.

RAISES	DESCRIPTION
`AssertionError`	If the length of next_instance_beam_states is not equal to zero when cur_len is equal to max_length, or not equal to self.num_beams otherwise.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def process(
    self,
    batch_size: int,
    cur_len: int,
    _next_scores: mindspore.Tensor,
    next_scores: mindspore.Tensor,
    next_tokens: mindspore.Tensor,
    vocab_size: Optional[int] = None,
    pad_token_id: Optional[int] = None,
    bos_token_id: Optional[int] = None,
    eos_token_id: Optional[Union[int, List[int]]] = None,
    max_length: Optional[int] = None,
    ext_table_sub_cpu: Optional[mindspore.Tensor] = None,
    ext_table_ids_cpu: Optional[mindspore.Tensor] = None,
    **model_kwargs,
) -> Tuple[mindspore.Tensor]:
    """
    Process the beam search for the CpmBeeBeamSearchScorer.

    Args:
        self: The instance of the CpmBeeBeamSearchScorer class.
        batch_size (int): The batch size for processing.
        cur_len (int): The current length of the sequence being processed.
        _next_scores (mindspore.Tensor): The scores for the next tokens.
        next_scores (mindspore.Tensor): The scores for the next tokens.
        next_tokens (mindspore.Tensor): The tokens for the next sequence.
        vocab_size (Optional[int]): The size of the vocabulary. Defaults to None.
        pad_token_id (Optional[int]): The token ID for padding. Defaults to None.
        bos_token_id (Optional[int]): The token ID for the beginning of sequence. Defaults to None.
        eos_token_id (Optional[Union[int, List[int]]]): The token ID for the end of sequence. Defaults to None.
        max_length (Optional[int]): The maximum length of the sequence. Defaults to None.
        ext_table_sub_cpu (Optional[mindspore.Tensor]): The CPU tensor for extended table sub.
        ext_table_ids_cpu (Optional[mindspore.Tensor]): The CPU tensor for extended table IDs.
        **model_kwargs: Additional keyword arguments for the model.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the next beam scores, next beam states, and next beam indices.

    Raises:
        AssertionError: If the length of next_instance_beam_states is not equal to zero when cur_len is equal to
            max_length, or not equal to self.num_beams otherwise.

    """
    next_beam_state = []
    for sent_id in range(batch_size):
        self._done[sent_id] = self._done[sent_id] or self._beam_hyps[sent_id].is_done(
            next_scores[sent_id].max().item(), cur_len
        )
        if self._done[sent_id]:
            next_beam_state.append(
                [
                    (
                        {
                            "idx": 0,
                            "ans": [],
                            "nx_token_id": pad_token_id,
                            "nx_token_sub": 0,
                            "nx_segment_id": 0,
                            "nx_position": 0,
                        },
                        0,
                        0,
                    )
                ]
                * self.num_beams
            )
            continue

        next_instance_beam_states = []

        for idx, value in zip(next_tokens[sent_id], next_scores[sent_id]):
            beam_id = ops.div(idx, _next_scores.shape[-1], rounding_mode="floor").item()
            word_id = (idx % _next_scores.shape[-1]).item()

            curr_info = self.beam_states[sent_id][beam_id]
            if (
                word_id == eos_token_id
                and (curr_info["idx"] + 1 == len(model_kwargs["other_info"][sent_id]["predict_segments"]))
            ) or cur_len == max_length:
                self._beam_hyps[sent_id].add(
                    self.beam_states[sent_id][beam_id]["ans"]
                    + [
                        (
                            word_id,
                            model_kwargs["other_info"][sent_id]["predict_segments"][curr_info["idx"]][1],
                        )
                    ],
                    value.item(),
                )
            elif word_id == eos_token_id:
                next_instance_beam_states.append(
                    (
                        {
                            "idx": curr_info["idx"] + 1,
                            "ans": curr_info["ans"]
                            + [
                                (
                                    word_id,
                                    model_kwargs["other_info"][sent_id]["predict_segments"][curr_info["idx"]][1],
                                )
                            ],
                            "nx_token_id": bos_token_id,
                            "nx_token_sub": 0,
                            "nx_segment_id": model_kwargs["other_info"][sent_id]["predict_segments"][
                                curr_info["idx"] + 1
                            ][0],
                            "nx_position": 0,
                        },
                        value.item(),
                        sent_id * self.num_beams + beam_id,
                    )
                )

            else:
                raw_word_id = word_id
                word_id_sub = 0
                if word_id >= vocab_size:
                    word_id -= vocab_size
                    word_id_sub = int(ext_table_sub_cpu[word_id].item())
                    word_id = int(ext_table_ids_cpu[word_id].item())

                next_instance_beam_states.append(
                    (
                        {
                            "idx": curr_info["idx"],
                            "ans": curr_info["ans"]
                            + [
                                (
                                    raw_word_id,
                                    model_kwargs["other_info"][sent_id]["predict_segments"][curr_info["idx"]][1],
                                )
                            ],
                            "nx_token_id": word_id,
                            "nx_token_sub": word_id_sub,
                            "nx_segment_id": curr_info["nx_segment_id"],
                            "nx_position": curr_info["nx_position"] + 1,
                        },
                        value.item(),
                        sent_id * self.num_beams + beam_id,
                    )
                )

            if len(next_instance_beam_states) == self.num_beams:
                break
        assert len(next_instance_beam_states) == 0 if cur_len == max_length else self.num_beams
        next_beam_state.append(next_instance_beam_states)

    if cur_len == max_length:
        return None

    beam_reorder_idx = []
    beam_new_scores = []
    beam_states = []
    for sent_id in range(batch_size):
        instance_beam_states = []
        for beam_id in range(self.num_beams):
            state, value, beam_idx = next_beam_state[sent_id][beam_id]
            beam_reorder_idx.append(beam_idx)
            beam_new_scores.append(value)
            instance_beam_states.append(state)
        beam_states.append(instance_beam_states)
    self.beam_states = beam_states

    return UserDict(
        {
            "next_beam_scores": mindspore.tensor(beam_new_scores).view(-1),
            "next_beam_states": beam_states,
            "next_beam_indices": mindspore.tensor(beam_reorder_idx, dtype=mindspore.int32).view(-1),
        }
    )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias` ¶

Bases: Module

This class represents a position bias computation module in the CpmBee model. It is used to calculate the relative position buckets for attention mechanism.

ATTRIBUTE	DESCRIPTION
`num_heads`	The number of attention heads. TYPE: `int`
`num_buckets`	The number of position bias buckets. TYPE: `int`
`num_segment_bucket`	The number of segment buckets used for position bias. TYPE: `int`
`max_distance`	The maximum distance for position bias calculation. TYPE: `int`
`relative_attention_bias`	The learnable parameter used for relative attention bias calculation. TYPE: `Parameter`

METHOD	DESCRIPTION
`__init__`	Initializes the CpmBeeBucketPositionBias instance.
`forward`	Constructs the position bias based on the given query and key positions and relative buckets.
`_position_bucket`	Computes the position bucket for the given relative position.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeBucketPositionBias(nn.Module):

    """
    This class represents a position bias computation module in the CpmBee model.
    It is used to calculate the relative position buckets for attention mechanism.

    Attributes:
        num_heads (int): The number of attention heads.
        num_buckets (int): The number of position bias buckets.
        num_segment_bucket (int): The number of segment buckets used for position bias.
        max_distance (int): The maximum distance for position bias calculation.
        relative_attention_bias (mindspore.Parameter): The learnable parameter used for relative attention bias calculation.

    Methods:
        __init__:
            Initializes the CpmBeeBucketPositionBias instance.

        forward:
            Constructs the position bias based on the given query and key positions and relative buckets.

        _position_bucket:
            Computes the position bucket for the given relative position.

    """
    def __init__(self, config: CpmBeeConfig) -> None:
        """Initializes an instance of the CpmBeeBucketPositionBias class.

        Args:
            self: The instance of the class.
            config (CpmBeeConfig):
                The configuration object containing various parameters.

                - num_attention_heads (int): The number of attention heads.
                - position_bias_num_buckets (int): The number of buckets for position bias.
                - position_bias_num_segment_buckets (int): The number of buckets for segment bias.
                - position_bias_max_distance (int): The maximum distance for position bias.
                - ms_dtype: The dtype for the position bias parameter.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.num_heads = config.num_attention_heads
        self.num_buckets = config.position_bias_num_buckets
        self.num_segment_bucket = config.position_bias_num_segment_buckets
        self.max_distance = config.position_bias_max_distance

        self.relative_attention_bias = Parameter(
            ops.zeros(
                config.position_bias_num_buckets + config.position_bias_num_segment_buckets,
                config.num_attention_heads,
                dtype=config.ms_dtype,
            ),
        )

    def forward(self, query_pos: mindspore.Tensor, key_pos: mindspore.Tensor, rel_buckets: mindspore.Tensor):
        """
        This method forwards relative position bias embeddings based on the input query positions, key positions,
        and relative buckets.

        Args:
            self (CpmBeeBucketPositionBias): An instance of the CpmBeeBucketPositionBias class.
            query_pos (mindspore.Tensor): A tensor representing the positions of queries in the input sequence.
            key_pos (mindspore.Tensor): A tensor representing the positions of keys in the input sequence.
            rel_buckets (mindspore.Tensor): A tensor containing relative position buckets.

        Returns:
            None: This method does not return any value explicitly.
                The forwarded embeddings are stored in the 'embeds' variable within the method.

        Raises:
            AssertionError:
                - If the number of batches in key_pos and query_pos tensors are not equal.
                - If the number of batches in rel_buckets and key_pos tensors are not equal.
                - If the number of query positions in the rel_buckets tensor does not match the query positions tensor.
                - If the number of key positions in the rel_buckets tensor does not match the key positions tensor.
        """
        batch = key_pos.shape[0]
        keylen = key_pos.shape[1]
        querylen = query_pos.shape[1]

        if key_pos.shape[0] != query_pos.shape[0]:
            raise AssertionError(
                f"key_pos.shape[0] should be equal to query_pos.shape[0], but got {key_pos.shape[0]} and {query_pos.shape[0]}!"
            )
        if rel_buckets.shape[0] != batch:
            raise AssertionError(
                f"rel_buckets.shape[0] should be equal to batch, but got {rel_buckets.shape[0]} and {batch}!"
            )
        if rel_buckets.shape[1] != querylen:
            raise AssertionError(
                f"rel_buckets.shape[1] should be equal to querylen, but got {rel_buckets.shape[1]} and {querylen}!"
            )
        if rel_buckets.shape[2] != keylen:
            raise AssertionError(
                f"rel_buckets.shape[2] should be equal to keylen, but got {rel_buckets.shape[2]} and {keylen}!"
            )

        relative_position_bucket = rel_buckets - 1 + self.num_buckets

        inner_segment_bucket = self._position_bucket(
            key_pos[..., None, :] - query_pos[..., :, None],
            num_buckets=self.num_buckets,
            max_distance=self.max_distance,
        )
        relative_position_bucket = ops.where(
            rel_buckets == 0,
            inner_segment_bucket,
            relative_position_bucket,
        )

        embeds = F.embedding(relative_position_bucket, self.relative_attention_bias)
        embeds = embeds.permute(0, 3, 1, 2)
        return embeds

    def _position_bucket(self, relative_position, num_buckets=32, max_distance=128):
        """
        This method calculates the position bucket for a given relative position within a specified range.

        Args:
            self: The instance of the CpmBeeBucketPositionBias class.
            relative_position (int): The relative position for which the bucket needs to be calculated.
            num_buckets (int, optional): The number of buckets to categorize the relative position into. Defaults to 32.
            max_distance (int, optional): The maximum distance for categorizing the relative position. Defaults to 128.

        Returns:
            None:
                This method does not return a value as it directly updates the 'relative_buckets' attribute of
                the CpmBeeBucketPositionBias instance.

        Raises:
            ValueError: If the 'relative_position' or 'num_buckets' is not a positive integer.
            ValueError: If the 'max_distance' is not a positive integer greater than 0.
            TypeError: If the 'relative_position', 'num_buckets', or 'max_distance' is not of type int.
            ValueError: If the 'num_buckets' is less than or equal to 0.
            ValueError: If the 'max_distance' is less than or equal to 0.
        """
        relative_buckets = 0
        num_buckets //= 2
        relative_buckets = (relative_position > 0).to(mindspore.int32) * num_buckets
        relative_position = ops.abs(relative_position)
        max_exact = num_buckets // 2
        is_small = relative_position < max_exact
        relative_postion_if_large = max_exact + (
            ops.log(relative_position.float() / max_exact)
            / math.log(max_distance / max_exact)
            * (num_buckets - max_exact)
        ).to(mindspore.int32)
        relative_postion_if_large = ops.minimum(
            relative_postion_if_large,
            ops.full_like(relative_postion_if_large, num_buckets - 1),
        )
        relative_buckets += ops.where(is_small, relative_position, relative_postion_if_large)
        return relative_buckets

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias.init(config)` ¶

Initializes an instance of the CpmBeeBucketPositionBias class.

PARAMETER DESCRIPTION

self

The instance of the class.

config

The configuration object containing various parameters.

num_attention_heads (int): The number of attention heads.
position_bias_num_buckets (int): The number of buckets for position bias.
position_bias_num_segment_buckets (int): The number of buckets for segment bias.
position_bias_max_distance (int): The maximum distance for position bias.
ms_dtype: The dtype for the position bias parameter.

TYPE: CpmBeeConfig

RETURNS	DESCRIPTION
`None`	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig) -> None:
    """Initializes an instance of the CpmBeeBucketPositionBias class.

    Args:
        self: The instance of the class.
        config (CpmBeeConfig):
            The configuration object containing various parameters.

            - num_attention_heads (int): The number of attention heads.
            - position_bias_num_buckets (int): The number of buckets for position bias.
            - position_bias_num_segment_buckets (int): The number of buckets for segment bias.
            - position_bias_max_distance (int): The maximum distance for position bias.
            - ms_dtype: The dtype for the position bias parameter.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.num_heads = config.num_attention_heads
    self.num_buckets = config.position_bias_num_buckets
    self.num_segment_bucket = config.position_bias_num_segment_buckets
    self.max_distance = config.position_bias_max_distance

    self.relative_attention_bias = Parameter(
        ops.zeros(
            config.position_bias_num_buckets + config.position_bias_num_segment_buckets,
            config.num_attention_heads,
            dtype=config.ms_dtype,
        ),
    )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias.forward(query_pos, key_pos, rel_buckets)` ¶

This method forwards relative position bias embeddings based on the input query positions, key positions, and relative buckets.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeBucketPositionBias class. TYPE: `CpmBeeBucketPositionBias`
`query_pos`	A tensor representing the positions of queries in the input sequence. TYPE: `Tensor`
`key_pos`	A tensor representing the positions of keys in the input sequence. TYPE: `Tensor`
`rel_buckets`	A tensor containing relative position buckets. TYPE: `Tensor`

RETURNS	DESCRIPTION
`None`	This method does not return any value explicitly. The forwarded embeddings are stored in the 'embeds' variable within the method.

RAISES	DESCRIPTION
`AssertionError`	If the number of batches in key_pos and query_pos tensors are not equal. If the number of batches in rel_buckets and key_pos tensors are not equal. If the number of query positions in the rel_buckets tensor does not match the query positions tensor. If the number of key positions in the rel_buckets tensor does not match the key positions tensor.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, query_pos: mindspore.Tensor, key_pos: mindspore.Tensor, rel_buckets: mindspore.Tensor):
    """
    This method forwards relative position bias embeddings based on the input query positions, key positions,
    and relative buckets.

    Args:
        self (CpmBeeBucketPositionBias): An instance of the CpmBeeBucketPositionBias class.
        query_pos (mindspore.Tensor): A tensor representing the positions of queries in the input sequence.
        key_pos (mindspore.Tensor): A tensor representing the positions of keys in the input sequence.
        rel_buckets (mindspore.Tensor): A tensor containing relative position buckets.

    Returns:
        None: This method does not return any value explicitly.
            The forwarded embeddings are stored in the 'embeds' variable within the method.

    Raises:
        AssertionError:
            - If the number of batches in key_pos and query_pos tensors are not equal.
            - If the number of batches in rel_buckets and key_pos tensors are not equal.
            - If the number of query positions in the rel_buckets tensor does not match the query positions tensor.
            - If the number of key positions in the rel_buckets tensor does not match the key positions tensor.
    """
    batch = key_pos.shape[0]
    keylen = key_pos.shape[1]
    querylen = query_pos.shape[1]

    if key_pos.shape[0] != query_pos.shape[0]:
        raise AssertionError(
            f"key_pos.shape[0] should be equal to query_pos.shape[0], but got {key_pos.shape[0]} and {query_pos.shape[0]}!"
        )
    if rel_buckets.shape[0] != batch:
        raise AssertionError(
            f"rel_buckets.shape[0] should be equal to batch, but got {rel_buckets.shape[0]} and {batch}!"
        )
    if rel_buckets.shape[1] != querylen:
        raise AssertionError(
            f"rel_buckets.shape[1] should be equal to querylen, but got {rel_buckets.shape[1]} and {querylen}!"
        )
    if rel_buckets.shape[2] != keylen:
        raise AssertionError(
            f"rel_buckets.shape[2] should be equal to keylen, but got {rel_buckets.shape[2]} and {keylen}!"
        )

    relative_position_bucket = rel_buckets - 1 + self.num_buckets

    inner_segment_bucket = self._position_bucket(
        key_pos[..., None, :] - query_pos[..., :, None],
        num_buckets=self.num_buckets,
        max_distance=self.max_distance,
    )
    relative_position_bucket = ops.where(
        rel_buckets == 0,
        inner_segment_bucket,
        relative_position_bucket,
    )

    embeds = F.embedding(relative_position_bucket, self.relative_attention_bias)
    embeds = embeds.permute(0, 3, 1, 2)
    return embeds

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT` ¶

Bases: Module

This class represents a dense gated activation module in the CpmBee framework. It performs a nonlinear transformation on an input tensor from one feature space to another using a gated activation function.

The class inherits from the nn.Module class.

ATTRIBUTE	DESCRIPTION
`w_0`	An instance of the CpmBeeLinear class representing the first linear transformation. TYPE: `CpmBeeLinear`
`w_1`	An instance of the CpmBeeLinear class representing the second linear transformation. TYPE: `CpmBeeLinear`
`act`	An instance of the GELU activation function. TYPE: `GELU`

METHOD	DESCRIPTION
`__init__`	Initializes the CpmBeeDenseGatedACT class.
`forward`	Transforms an input tensor from one feature space to another via a nonlinear operation.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeDenseGatedACT(nn.Module):

    """
    This class represents a dense gated activation module in the CpmBee framework.
    It performs a nonlinear transformation on an input tensor from one feature space to another using
    a gated activation function.

    The class inherits from the `nn.Module` class.

    Attributes:
        w_0 (CpmBeeLinear): An instance of the CpmBeeLinear class representing the first linear transformation.
        w_1 (CpmBeeLinear): An instance of the CpmBeeLinear class representing the second linear transformation.
        act (nn.GELU): An instance of the GELU activation function.

    Methods:
        __init__: Initializes the CpmBeeDenseGatedACT class.
        forward: Transforms an input tensor from one feature space to another via a nonlinear operation.

    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes a new instance of the CpmBeeDenseGatedACT class.

        Args:
            self: The current CpmBeeDenseGatedACT object.
            config (CpmBeeConfig): An instance of the CpmBeeConfig class containing configuration parameters.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.w_0 = CpmBeeLinear(config.hidden_size, config.dim_ff, dtype=config.ms_dtype)
        self.w_1 = CpmBeeLinear(config.hidden_size, config.dim_ff, dtype=config.ms_dtype)
        self.act = nn.GELU()

    def forward(self, hidden_states: mindspore.Tensor):
        """Transform an input tensor from one feature space to another via a nonlinear operation

        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        gate_score = self.act(self.w_0(hidden_states))
        hidden_states = self.w_1(hidden_states)

        hidden_states = gate_score * hidden_states
        return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT.init(config)` ¶

Initializes a new instance of the CpmBeeDenseGatedACT class.

PARAMETER	DESCRIPTION
`self`	The current CpmBeeDenseGatedACT object.
`config`	An instance of the CpmBeeConfig class containing configuration parameters. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes a new instance of the CpmBeeDenseGatedACT class.

    Args:
        self: The current CpmBeeDenseGatedACT object.
        config (CpmBeeConfig): An instance of the CpmBeeConfig class containing configuration parameters.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.w_0 = CpmBeeLinear(config.hidden_size, config.dim_ff, dtype=config.ms_dtype)
    self.w_1 = CpmBeeLinear(config.hidden_size, config.dim_ff, dtype=config.ms_dtype)
    self.act = nn.GELU()

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT.forward(hidden_states)` ¶

Transform an input tensor from one feature space to another via a nonlinear operation

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, hidden_states: mindspore.Tensor):
    """Transform an input tensor from one feature space to another via a nonlinear operation

    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    gate_score = self.act(self.w_0(hidden_states))
    hidden_states = self.w_1(hidden_states)

    hidden_states = gate_score * hidden_states
    return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt` ¶

Bases: Embedding

Contains a RotaryEmbedding.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeEmbeddingExt(nn.Embedding):
    """
    Contains a RotaryEmbedding.
    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initialize the CpmBeeEmbeddingExt object.

        Args:
            self: The instance of the CpmBeeEmbeddingExt class.
            config (CpmBeeConfig):
                An instance of CpmBeeConfig containing configuration parameters for the embedding.

                - vocab_size (int): The size of the vocabulary.
                - hidden_size (int): The size of the hidden layer.
                - ms_dtype: The data type for model parameters.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config.vocab_size, config.hidden_size, dtype=config.ms_dtype)
        self.dim_model = config.hidden_size
        self.rotary_emb = CpmBeeRotaryEmbedding(config)

    def forward(self, ids: mindspore.Tensor, ids_sub: mindspore.Tensor):
        """
        Construct and return the embeddings of the given input IDs and sub-IDs for the CpmBeeEmbeddingExt class.

        Args:
            self (CpmBeeEmbeddingExt): An instance of the CpmBeeEmbeddingExt class.
            ids (mindspore.Tensor):
                The input IDs tensor:

                - Shape: (batch_size, sequence_length).
                - Type: int32 or int64.
                - Purpose: Represent the input IDs for which embeddings need to be forwarded.
            ids_sub (mindspore.Tensor):
                The sub-IDs tensor.

                - Shape: (batch_size, sequence_length).
                - Type: int32 or int64.
                - Purpose: Represent the sub-IDs for modifying the embeddings.

        Returns:
            None.

        Raises:
            None.
        """
        embeds = super().forward(ids) / math.sqrt(self.dim_model)
        return self.rotary_emb(embeds, ids_sub)

    def projection(self, x: mindspore.Tensor, ext_table: Optional[mindspore.Tensor] = None):
        """
        This method projects the input tensor 'x' using a dense layer and optionally concatenates it with another tensor 'ext_table'.

        Args:
            self: Instance of the class CpmBeeEmbeddingExt.
            x (mindspore.Tensor): Input tensor to be projected. It should have a shape compatible with the weight tensor.
            ext_table (Optional[mindspore.Tensor], optional): Additional tensor to be concatenated with the projected tensor 'x'.
                It should have a compatible shape with 'x'. Defaults to None.

        Returns:
            mindspore.Tensor or None: The projected tensor 'x' after applying the dense layer operation.
                If 'ext_table' is provided and has a non-zero shape, the concatenated tensor is returned.

        Raises:
            None
        """
        logits = F.linear(x / math.sqrt(self.dim_model), self.weight)
        if ext_table is not None and 0 not in ext_table.shape:
            logits_ext = F.linear(x, ext_table)
            logits = ops.cat([logits, logits_ext], dim=-1)
        return logits

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.init(config)` ¶

Initialize the CpmBeeEmbeddingExt object.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeEmbeddingExt class.
`config`	An instance of CpmBeeConfig containing configuration parameters for the embedding. vocab_size (int): The size of the vocabulary. hidden_size (int): The size of the hidden layer. ms_dtype: The data type for model parameters. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initialize the CpmBeeEmbeddingExt object.

    Args:
        self: The instance of the CpmBeeEmbeddingExt class.
        config (CpmBeeConfig):
            An instance of CpmBeeConfig containing configuration parameters for the embedding.

            - vocab_size (int): The size of the vocabulary.
            - hidden_size (int): The size of the hidden layer.
            - ms_dtype: The data type for model parameters.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config.vocab_size, config.hidden_size, dtype=config.ms_dtype)
    self.dim_model = config.hidden_size
    self.rotary_emb = CpmBeeRotaryEmbedding(config)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.forward(ids, ids_sub)` ¶

Construct and return the embeddings of the given input IDs and sub-IDs for the CpmBeeEmbeddingExt class.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeEmbeddingExt class. TYPE: `CpmBeeEmbeddingExt`
`ids`	The input IDs tensor: Shape: (batch_size, sequence_length). Type: int32 or int64. Purpose: Represent the input IDs for which embeddings need to be forwarded. TYPE: `Tensor`
`ids_sub`	The sub-IDs tensor. Shape: (batch_size, sequence_length). Type: int32 or int64. Purpose: Represent the sub-IDs for modifying the embeddings. TYPE: `Tensor`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, ids: mindspore.Tensor, ids_sub: mindspore.Tensor):
    """
    Construct and return the embeddings of the given input IDs and sub-IDs for the CpmBeeEmbeddingExt class.

    Args:
        self (CpmBeeEmbeddingExt): An instance of the CpmBeeEmbeddingExt class.
        ids (mindspore.Tensor):
            The input IDs tensor:

            - Shape: (batch_size, sequence_length).
            - Type: int32 or int64.
            - Purpose: Represent the input IDs for which embeddings need to be forwarded.
        ids_sub (mindspore.Tensor):
            The sub-IDs tensor.

            - Shape: (batch_size, sequence_length).
            - Type: int32 or int64.
            - Purpose: Represent the sub-IDs for modifying the embeddings.

    Returns:
        None.

    Raises:
        None.
    """
    embeds = super().forward(ids) / math.sqrt(self.dim_model)
    return self.rotary_emb(embeds, ids_sub)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.projection(x, ext_table=None)` ¶

This method projects the input tensor 'x' using a dense layer and optionally concatenates it with another tensor 'ext_table'.

PARAMETER	DESCRIPTION
`self`	Instance of the class CpmBeeEmbeddingExt.
`x`	Input tensor to be projected. It should have a shape compatible with the weight tensor. TYPE: `Tensor`
`ext_table`	Additional tensor to be concatenated with the projected tensor 'x'. It should have a compatible shape with 'x'. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`

RETURNS	DESCRIPTION
	mindspore.Tensor or None: The projected tensor 'x' after applying the dense layer operation. If 'ext_table' is provided and has a non-zero shape, the concatenated tensor is returned.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def projection(self, x: mindspore.Tensor, ext_table: Optional[mindspore.Tensor] = None):
    """
    This method projects the input tensor 'x' using a dense layer and optionally concatenates it with another tensor 'ext_table'.

    Args:
        self: Instance of the class CpmBeeEmbeddingExt.
        x (mindspore.Tensor): Input tensor to be projected. It should have a shape compatible with the weight tensor.
        ext_table (Optional[mindspore.Tensor], optional): Additional tensor to be concatenated with the projected tensor 'x'.
            It should have a compatible shape with 'x'. Defaults to None.

    Returns:
        mindspore.Tensor or None: The projected tensor 'x' after applying the dense layer operation.
            If 'ext_table' is provided and has a non-zero shape, the concatenated tensor is returned.

    Raises:
        None
    """
    logits = F.linear(x / math.sqrt(self.dim_model), self.weight)
    if ext_table is not None and 0 not in ext_table.shape:
        logits_ext = F.linear(x, ext_table)
        logits = ops.cat([logits, logits_ext], dim=-1)
    return logits

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder` ¶

Bases: Module

CpmBeeEncoder is a class that represents an encoder module for the CpmBeeTransformer model. This class inherits from nn.Module and is responsible for processing input data through multiple transformer blocks.

ATTRIBUTE	DESCRIPTION
`num_layers`	The number of transformer blocks in the encoder. TYPE: `int`
`layers`	List of CpmBeeTransformerBlock instances representing each transformer block in the encoder. TYPE: `ModuleList`
`output_layernorm`	Layer normalization module for the encoder output. TYPE: `CpmBeeLayerNorm`

METHOD DESCRIPTION

__init__

Initializes the CpmBeeEncoder instance with the provided configuration.

forward

Processes the input hidden_states through the encoder layers.

Args:

hidden_states (mindspore.Tensor): Input tensor of shape (batch, seq_len, dim_model).
attention_mask (mindspore.Tensor): Tensor to mask invalid areas during calculation of shape (batch, seq_len, seq_len).
position_bias (mindspore.Tensor): Tensor providing position information to the attention mechanism of shape (num_heads, seq_len, seq_len).
output_attentions (bool, optional): Indicates whether to return attention tensors of all layers.
output_hidden_states (bool, optional): Indicates whether to return hidden states of all layers.
past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor], optional): Cached past key and value projection states.
use_cache (bool, optional): If True, past key and value states are returned for speeding up decoding.

Returns:

mindspore.Tensor: Processed hidden states after passing through all encoder layers.
Tuple[mindspore.Tensor, ...]: Cached key values if 'use_cache' is enabled.
Tuple[mindspore.Tensor, ...]: Hidden states of all layers if 'output_hidden_states' is enabled.
Tuple[mindspore.Tensor, ...]: Attention weights of all layers if 'output_attentions' is enabled.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeEncoder(nn.Module):

    """
    CpmBeeEncoder is a class that represents an encoder module for the CpmBeeTransformer model.
    This class inherits from nn.Module and is responsible for processing input data through multiple transformer blocks.

    Attributes:
        num_layers (int): The number of transformer blocks in the encoder.
        layers (nn.ModuleList): List of CpmBeeTransformerBlock instances representing each transformer block in the encoder.
        output_layernorm (CpmBeeLayerNorm): Layer normalization module for the encoder output.

    Methods:
        __init__:
            Initializes the CpmBeeEncoder instance with the provided configuration.

        forward:
            Processes the input hidden_states through the encoder layers.

             Args:

            - hidden_states (mindspore.Tensor): Input tensor of shape (batch, seq_len, dim_model).
            - attention_mask (mindspore.Tensor):
            Tensor to mask invalid areas during calculation of shape (batch, seq_len, seq_len).
            - position_bias (mindspore.Tensor):
            Tensor providing position information to the attention mechanism of shape (num_heads, seq_len, seq_len).
            - output_attentions (bool, optional): Indicates whether to return attention tensors of all layers.
            - output_hidden_states (bool, optional): Indicates whether to return hidden states of all layers.
            - past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor], optional): Cached past key and value projection states.
            - use_cache (bool, optional): If True, past key and value states are returned for speeding up decoding.

            Returns:

            - mindspore.Tensor: Processed hidden states after passing through all encoder layers.
            - Tuple[mindspore.Tensor, ...]: Cached key values if 'use_cache' is enabled.
            - Tuple[mindspore.Tensor, ...]: Hidden states of all layers if 'output_hidden_states' is enabled.
            - Tuple[mindspore.Tensor, ...]: Attention weights of all layers if 'output_attentions' is enabled.
    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes a new instance of the CpmBeeEncoder class.

        Args:
            self: The instance of the CpmBeeEncoder class.
            config (CpmBeeConfig): An instance of the CpmBeeConfig class containing configuration parameters for the encoder.
                This parameter is used to configure the encoder's behavior and settings.
                The config parameter must be of type CpmBeeConfig.

        Returns:
            None.

        Raises:
            AssertionError: If the length of config.mask_modules does not equal the number of hidden layers specified in config.
            AssertionError: If the length of mask_module within config.mask_modules is not 2 for each mask_module in the list.
        """
        super().__init__()
        self.num_layers = config.num_hidden_layers
        if config.mask_modules is not None:
            assert len(config.mask_modules) == self.num_layers, "The total number of masks should equal to num_layers"
            for mask_module in config.mask_modules:
                assert len(mask_module) == 2, "For encoder, each mask should be (mask_att, mask_ffn)"
        else:
            config.mask_modules = [(False, False)] * self.num_layers

        self.layers = nn.ModuleList(
            [
                CpmBeeTransformerBlock(
                    config, mask_att=config.mask_modules[ith][0], mask_ffn=config.mask_modules[ith][1]
                )
                for ith in range(self.num_layers)
            ]
        )

        self.output_layernorm = CpmBeeLayerNorm(config)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: mindspore.Tensor,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor`):
                Input to the layer of shape `(batch, seq_len, dim_model)`
            attention_mask (`mindspore.Tensor`):
                Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
            position_bias (`mindspore.Tensor`):
                Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
                Cached past key and value projection states
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        all_hidden_states = () if output_hidden_states else None
        all_self_attns = () if output_attentions else None
        current_key_values = () if use_cache else None

        for i, layer in enumerate(self.layers):
            if output_hidden_states:
                all_hidden_states += (hidden_states,)
            layer_outputs = layer(
                hidden_states,
                attention_mask,
                position_bias,
                output_attentions=output_attentions,
                past_key_values=past_key_values[i] if past_key_values else None,
                use_cache=use_cache,
            )
            hidden_states, attn_weights, current_key_value = layer_outputs
            if output_attentions:
                all_self_attns += (attn_weights,)
            if current_key_values is not None:
                current_key_values = current_key_values + (current_key_value,)

        hidden_states = self.output_layernorm(hidden_states)

        if output_hidden_states:
            all_hidden_states += (hidden_states,)

        return hidden_states, current_key_values, all_hidden_states, all_self_attns

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder.init(config)` ¶

Initializes a new instance of the CpmBeeEncoder class.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeEncoder class.
`config`	An instance of the CpmBeeConfig class containing configuration parameters for the encoder. This parameter is used to configure the encoder's behavior and settings. The config parameter must be of type CpmBeeConfig. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None.

RAISES	DESCRIPTION
`AssertionError`	If the length of config.mask_modules does not equal the number of hidden layers specified in config.
`AssertionError`	If the length of mask_module within config.mask_modules is not 2 for each mask_module in the list.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes a new instance of the CpmBeeEncoder class.

    Args:
        self: The instance of the CpmBeeEncoder class.
        config (CpmBeeConfig): An instance of the CpmBeeConfig class containing configuration parameters for the encoder.
            This parameter is used to configure the encoder's behavior and settings.
            The config parameter must be of type CpmBeeConfig.

    Returns:
        None.

    Raises:
        AssertionError: If the length of config.mask_modules does not equal the number of hidden layers specified in config.
        AssertionError: If the length of mask_module within config.mask_modules is not 2 for each mask_module in the list.
    """
    super().__init__()
    self.num_layers = config.num_hidden_layers
    if config.mask_modules is not None:
        assert len(config.mask_modules) == self.num_layers, "The total number of masks should equal to num_layers"
        for mask_module in config.mask_modules:
            assert len(mask_module) == 2, "For encoder, each mask should be (mask_att, mask_ffn)"
    else:
        config.mask_modules = [(False, False)] * self.num_layers

    self.layers = nn.ModuleList(
        [
            CpmBeeTransformerBlock(
                config, mask_att=config.mask_modules[ith][0], mask_ffn=config.mask_modules[ith][1]
            )
            for ith in range(self.num_layers)
        ]
    )

    self.output_layernorm = CpmBeeLayerNorm(config)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder.forward(hidden_states, attention_mask, position_bias, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None)` ¶

PARAMETER	DESCRIPTION
`hidden_states`	Input to the layer of shape `(batch, seq_len, dim_model)` TYPE: `mindspore.Tensor`
`attention_mask`	Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)` TYPE: `mindspore.Tensor`
`position_bias`	Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)` TYPE: `mindspore.Tensor`
`output_attentions`	Whether or not to return the attentions tensors of all attention layers. TYPE: `bool`, optional DEFAULT: `None`
`output_hidden_states`	Whether or not to return the hidden states of all layers. TYPE: `bool`, optional DEFAULT: `None`
`past_key_values`	Cached past key and value projection states TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor])`, optional DEFAULT: `None`
`use_cache`	If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). TYPE: `bool`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: mindspore.Tensor,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor`):
            Input to the layer of shape `(batch, seq_len, dim_model)`
        attention_mask (`mindspore.Tensor`):
            Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
        position_bias (`mindspore.Tensor`):
            Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
            Cached past key and value projection states
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    all_hidden_states = () if output_hidden_states else None
    all_self_attns = () if output_attentions else None
    current_key_values = () if use_cache else None

    for i, layer in enumerate(self.layers):
        if output_hidden_states:
            all_hidden_states += (hidden_states,)
        layer_outputs = layer(
            hidden_states,
            attention_mask,
            position_bias,
            output_attentions=output_attentions,
            past_key_values=past_key_values[i] if past_key_values else None,
            use_cache=use_cache,
        )
        hidden_states, attn_weights, current_key_value = layer_outputs
        if output_attentions:
            all_self_attns += (attn_weights,)
        if current_key_values is not None:
            current_key_values = current_key_values + (current_key_value,)

    hidden_states = self.output_layernorm(hidden_states)

    if output_hidden_states:
        all_hidden_states += (hidden_states,)

    return hidden_states, current_key_values, all_hidden_states, all_self_attns

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock` ¶

Bases: Module

This class represents a feed-forward block in the CpmBee model. It is used to process hidden states before the feed-forward layer.

The CpmBeeFFNBlock class inherits from nn.Module.

ATTRIBUTE	DESCRIPTION
`layernorm_before_ffn`	An instance of the CpmBeeLayerNorm class that performs layer normalization before the feed-forward layer. TYPE: `CpmBeeLayerNorm`
`ffn`	An instance of the CpmBeeFeedForward class that represents the feed-forward layer. TYPE: `CpmBeeFeedForward`
`dropout`	An optional dropout layer. If None, no dropout is applied. TYPE: `Dropout or None`

METHOD	DESCRIPTION
`__init__`	Initializes the CpmBeeFFNBlock object.
`forward`	Processes the hidden states before the feed-forward layer.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeFFNBlock(nn.Module):

    """
    This class represents a feed-forward block in the CpmBee model. It is used to process hidden states before the feed-forward layer.

    The CpmBeeFFNBlock class inherits from nn.Module.

    Attributes:
        layernorm_before_ffn (CpmBeeLayerNorm): An instance of the CpmBeeLayerNorm class that performs layer normalization before the feed-forward layer.
        ffn (CpmBeeFeedForward): An instance of the CpmBeeFeedForward class that represents the feed-forward layer.
        dropout (nn.Dropout or None): An optional dropout layer. If None, no dropout is applied.

    Methods:
        __init__: Initializes the CpmBeeFFNBlock object.
        forward: Processes the hidden states before the feed-forward layer.

    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes a CpmBeeFFNBlock instance.

        Args:
            self: The current object instance.
            config (CpmBeeConfig): The configuration object containing the parameters for the CpmBeeFFNBlock.
                This object must be an instance of CpmBeeConfig class.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.layernorm_before_ffn = CpmBeeLayerNorm(config)
        self.ffn = CpmBeeFeedForward(config)
        if config.dropout_p:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_states: mindspore.Tensor,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
                Hidden states before feed forward layer.
        """
        ln_outputs = self.layernorm_before_ffn(hidden_states)
        outputs = self.ffn(ln_outputs)
        if self.dropout is not None:
            outputs = self.dropout(outputs)
        hidden_states = (hidden_states + outputs) / 1.05
        return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock.init(config)` ¶

Initializes a CpmBeeFFNBlock instance.

PARAMETER	DESCRIPTION
`self`	The current object instance.
`config`	The configuration object containing the parameters for the CpmBeeFFNBlock. This object must be an instance of CpmBeeConfig class. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes a CpmBeeFFNBlock instance.

    Args:
        self: The current object instance.
        config (CpmBeeConfig): The configuration object containing the parameters for the CpmBeeFFNBlock.
            This object must be an instance of CpmBeeConfig class.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.layernorm_before_ffn = CpmBeeLayerNorm(config)
    self.ffn = CpmBeeFeedForward(config)
    if config.dropout_p:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock.forward(hidden_states)` ¶

PARAMETER	DESCRIPTION
`hidden_states`	Hidden states before feed forward layer. TYPE: `mindspore.Tensor` of shape `(batch, len_seq, dim_model)`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    hidden_states: mindspore.Tensor,
):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
            Hidden states before feed forward layer.
    """
    ln_outputs = self.layernorm_before_ffn(hidden_states)
    outputs = self.ffn(ln_outputs)
    if self.dropout is not None:
        outputs = self.dropout(outputs)
    hidden_states = (hidden_states + outputs) / 1.05
    return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward` ¶

Bases: Module

This class represents a feedforward neural network layer for the CpmBee model. It consists of a dense gated activation layer (CpmBeeDenseGatedACT), optional dropout layer, and a linear transformation layer (CpmBeeLinear).

ATTRIBUTE	DESCRIPTION
`w_in`	Instance of `CpmBeeDenseGatedACT` for processing input hidden states.
`dropout`	Optional dropout layer for regularization.
`w_out`	Instance of `CpmBeeLinear` for transforming hidden states to output.

METHOD	DESCRIPTION
`__init__`	Constructor method initializing the feedforward layer.
`forward`	Method for processing input hidden states through the feedforward layer.

PARAMETER	DESCRIPTION
`config`	Configuration object of type `CpmBeeConfig` containing layer specifications. TYPE: `CpmBeeConfig`
`hidden_states`	Input tensor of shape `(batch, seq_len, dim_in)` representing hidden states.

RETURNS	DESCRIPTION
	mindspore.Tensor: Transformed hidden states after passing through the feedforward layer.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeFeedForward(nn.Module):

    """
    This class represents a feedforward neural network layer for the CpmBee model.
    It consists of a dense gated activation layer (`CpmBeeDenseGatedACT`), optional dropout layer,
    and a linear transformation layer (`CpmBeeLinear`).

    Attributes:
        w_in: Instance of `CpmBeeDenseGatedACT` for processing input hidden states.
        dropout: Optional dropout layer for regularization.
        w_out: Instance of `CpmBeeLinear` for transforming hidden states to output.

    Methods:
        __init__: Constructor method initializing the feedforward layer.
        forward: Method for processing input hidden states through the feedforward layer.

    Args:
        config: Configuration object of type `CpmBeeConfig` containing layer specifications.
        hidden_states: Input tensor of shape `(batch, seq_len, dim_in)` representing hidden states.

    Returns:
        mindspore.Tensor: Transformed hidden states after passing through the feedforward layer.
    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes an instance of the CpmBeeFeedForward class.

        Args:
            self: The instance of the class.
            config (CpmBeeConfig): An object of the CpmBeeConfig class containing configuration parameters.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.w_in = CpmBeeDenseGatedACT(config)
        if config.dropout_p is not None:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

        self.w_out = CpmBeeLinear(config.dim_ff, config.hidden_size, dtype=config.ms_dtype)

    def forward(self, hidden_states: mindspore.Tensor):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        hidden_states = self.w_in(hidden_states)

        if self.dropout is not None:
            hidden_states = self.dropout(hidden_states)

        hidden_states = self.w_out(hidden_states)

        return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward.init(config)` ¶

Initializes an instance of the CpmBeeFeedForward class.

PARAMETER	DESCRIPTION
`self`	The instance of the class.
`config`	An object of the CpmBeeConfig class containing configuration parameters. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes an instance of the CpmBeeFeedForward class.

    Args:
        self: The instance of the class.
        config (CpmBeeConfig): An object of the CpmBeeConfig class containing configuration parameters.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.w_in = CpmBeeDenseGatedACT(config)
    if config.dropout_p is not None:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

    self.w_out = CpmBeeLinear(config.dim_ff, config.hidden_size, dtype=config.ms_dtype)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward.forward(hidden_states)` ¶

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, hidden_states: mindspore.Tensor):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    hidden_states = self.w_in(hidden_states)

    if self.dropout is not None:
        hidden_states = self.dropout(hidden_states)

    hidden_states = self.w_out(hidden_states)

    return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM` ¶

Bases: CpmBeePreTrainedModel

This class represents a CPMBee model for Causal Language Modeling tasks. It inherits from CpmBeePreTrainedModel and implements methods for model initialization, inference, beam search generation, input embeddings handling, and more.

The class includes methods for initializing the model, forwarding the model for inference, performing inference, getting and setting input embeddings, getting and setting output embeddings, preparing inputs for generation, updating model kwargs for generation, reordering cache during generation, expanding inputs for generation, adjusting logits during generation, performing beam search for generation, and generating outputs based on input data using beam search.

The generate method processes input data using the model to generate responses, filling placeholders in the input data with generated text. It accepts a dictionary or a list of dictionaries as input and returns a dictionary or a list of dictionaries with the '' field filled with generated text.

For more details on the methods and their parameters, please refer to the method docstrings within the class implementation.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeForCausalLM(CpmBeePreTrainedModel):

    """
    This class represents a CPMBee model for Causal Language Modeling tasks. It inherits from CpmBeePreTrainedModel and
    implements methods for model initialization, inference, beam search generation, input embeddings handling, and more.

    The class includes methods for initializing the model, forwarding the model for inference, performing inference,
    getting and setting input embeddings, getting and setting output embeddings, preparing inputs for generation,
    updating model kwargs for generation, reordering cache during generation, expanding inputs for generation,
    adjusting logits during generation, performing beam search for generation, and generating outputs based on
    input data using beam search.

    The `generate` method processes input data using the model to generate responses, filling placeholders in the
    input data with generated text. It accepts a dictionary or a list of dictionaries as input and
    returns a dictionary or a list of dictionaries with the '<ans>' field filled with generated text.

    For more details on the methods and their parameters, please refer to the method docstrings within the class
    implementation.
    """
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config: CpmBeeConfig):
        """
        Initializes a new instance of the CpmBeeForCausalLM class.

        Args:
            self: The object instance.
            config (CpmBeeConfig): The configuration object for the CpmBee model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.cpmbee = CpmBeeModel(config)

        # lm_head.weight is tied to cpmbee.input_embedding.weight
        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        input_id_sub: Optional[mindspore.Tensor] = None,
        length: Optional[mindspore.Tensor] = None,
        context: Optional[mindspore.Tensor] = None,
        sample_ids: Optional[mindspore.Tensor] = None,
        num_segments: Optional[mindspore.Tensor] = None,
        segment: Optional[mindspore.Tensor] = None,
        segment_rel_offset: Optional[mindspore.Tensor] = None,
        segment_rel: Optional[mindspore.Tensor] = None,
        span: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[List] = None,
        use_cache: Optional[bool] = None,
        labels: Optional[mindspore.Tensor] = None,
        return_dict: Optional[bool] = None,
        ext_table_ids: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
        ext_table_sub: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
        **kwargs,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Indices of input sequence tokens in the vocabulary.

                Indices can be obtained using [`CPMBeeTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            input_id_sub (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Subscription of input sequence tokens in the vocabulary.

                Subscription of normal text will be zero while the special tokens of each group will be the 0, 1, 2,
                ... <ans_0>, <ans_1>, <ans_2> ... belongs to group <ans>. <mask_0>, <mask_1>, <mask_2> ... belongs to
                group <mask>.
            length (`mindspore.Tensor` of shape `(batch_size)`):
                The length of sequences in batch.
            context (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Whether this token id is context or not. If is context, the value is 1. If not, the value is 0. If a
                token id is context, it does not need to be predicted.
            sample_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Give a sample id to every token id. The token ids with same sample ids belongs to the same sample.
            num_segments (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Total number of segments in the current input.
            segment (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Give a segment id to every token id. The token ids with same segment ids belongs to the same sample.

                Generally, a string key or value in input data will be a segment. For example, input {"input": "hello,
                ", "<ans>": ""}, the segments includes: "input", "hello, ", "<ans>" and "".
            segment_rel_offset (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                The offset of segment rel.
            segment_rel (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                The segment relevance. A relative implementation of measuring the importance of segments.
            span (`Dict[str, Union[mindspore.Tensor, List]]`):
                Span will record every input_ids shape.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
                A dummy arguments for CPMBee. The `past_states` contains pre-computed hidden-states (key and values in
                the self-attention blocks and in the cross-attention blocks) that can be used (see `past_key_values`
                input) and other history arguments to speed up sequential decoding.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
            ext_table_ids (`mindspore.Tensor`, *optional*):
                ext_table ids for embedding projection.
            ext_table_sub (`mindspore.Tensor`, *optional*):
                ext_table subscriptions for embedding projection.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        model_output = self.cpmbee(
            input_ids,
            input_id_sub,
            length,
            context,
            sample_ids,
            num_segments,
            segment,
            segment_rel_offset,
            segment_rel,
            span,
            output_attentions,
            output_hidden_states,
            past_key_values,
            use_cache,
            return_dict,
        )
        hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

        if ext_table_ids is not None:
            ext_table = self.cpmbee.input_embedding(ext_table_ids, ext_table_sub)
        else:
            ext_table = None
        logits = self.cpmbee.input_embedding.projection(hidden_states, ext_table)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(logits.view(-1, logits.shape[-1]), labels.long().view(-1))

        if not return_dict:
            output = (logits,) + model_output[1:]
            return ((loss,) + output) if loss is not None else output

        return CausalLMOutputWithPast(
            loss=loss,
            logits=logits,
            past_key_values=model_output.past_key_values,
            hidden_states=model_output.hidden_states,
            attentions=model_output.attentions,
        )

    def inference(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        input_id_sub: Optional[mindspore.Tensor] = None,
        position: Optional[mindspore.Tensor] = None,
        context: Optional[mindspore.Tensor] = None,
        sample_ids: Optional[mindspore.Tensor] = None,
        num_segments: Optional[mindspore.Tensor] = None,
        segment: Optional[mindspore.Tensor] = None,
        segment_rel_offset: Optional[mindspore.Tensor] = None,
        segment_rel: Optional[mindspore.Tensor] = None,
        past_states: Optional[Dict] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[List] = None,
        use_cache: Optional[bool] = None,
        labels: Optional[mindspore.Tensor] = None,
        return_dict: Optional[bool] = None,
        ext_table_ids: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
        ext_table_sub: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
        **kwargs,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Indices of input sequence tokens in the vocabulary.

                Indices can be obtained using [`CPMBeeTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            input_id_sub (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Subscription of input sequence tokens in the vocabulary.

                Subscription of normal text will be zero while the special tokens of each group will be the 0, 1, 2,
                ... <ans_0>, <ans_1>, <ans_2> ... belongs to group <ans>. <mask_0>, <mask_1>, <mask_2> ... belongs to
                group <mask>.
            position (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                The position of input sequence tokens in the vocabulary for each segment. if segment1 is 0, 1, 2 and
                segment2 is 0, 1, 2, 3, the position will be 0, 1, 2, 0, 1, 2, 3
            context (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Whether this token id is context or not. If is context, the value is 1. If not, the value is 0. If a
                token id is context, it does not need to be predicted.
            sample_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Give a sample id to every token id. The token ids with same sample ids belongs to the same sample.
            num_segments (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Total number of segments in the current input.
            segment (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Give a segment id to every token id. The token ids with same segment ids belongs to the same sample.

                Generally, a string key or value in input data will be a segment. For example, input {"input": "hello,
                ", "<ans>": ""}, the segments includes: "input", "hello, ", "<ans>" and "".
            segment_rel_offset (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                The offset of segment rel.
            segment_rel (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                The segment relevance. A relative implementation of measuring the importance of segments.
            past_states (`Dict[str, Union[mindspore.Tensor, List]]`):
                Store the history information including position, context, sample_ids, num_segments, segment and
                past_key_values.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
                A dummy arguments for CPMBee. The `past_states` contains pre-computed hidden-states (key and values in
                the self-attention blocks and in the cross-attention blocks) that can be used (see `past_key_values`
                input) and other history arguments to speed up sequential decoding.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
            ext_table_ids (`mindspore.Tensor`, *optional*):
                ext_table ids for embedding projection.
            ext_table_sub (`mindspore.Tensor`, *optional*):
                ext_table subscriptions for embedding projection.

        Example:
            Text Generation with CpmBeeForCausalLM.
            ```python
            >>> from transformers import CpmBeeTokenizer, CpmBeeForCausalLM
            ...
            >>> texts = {"input": "今天天气不错，", "<ans>": ""}
            >>> model = CpmBeeForCausalLM.from_pretrained("openbmb/cpm-bee-10b")
            >>> tokenizer = CPMBeeTokenizer.from_pretrained("openbmb/cpm-bee-10b")
            >>> output_texts = model.generate({"input": "今天天气不错，", "<ans>": ""}, tokenizer)
            >>> print(output_texts)
            {'input': '今天天气不错，', '<ans>': '适合睡觉。'}
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        model_output = self.cpmbee.inference(
            input_ids,
            input_id_sub,
            position,
            context,
            sample_ids,
            num_segments,
            segment,
            segment_rel_offset,
            segment_rel,
            past_states,
            output_attentions,
            output_hidden_states,
            past_key_values,
            use_cache,
            return_dict,
        )
        hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

        if ext_table_ids is not None and 0 not in ext_table_ids.shape:
            ext_table = self.cpmbee.input_embedding(ext_table_ids, ext_table_sub)
        else:
            ext_table = None
        logits = self.cpmbee.input_embedding.projection(hidden_states, ext_table)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(logits.view(-1, logits.shape[-1]), labels.view(-1))

        if not return_dict:
            output = (logits,) + model_output[1:]
            return ((loss,) + output) if loss is not None else output

        return CausalLMOutputWithPast(
            loss=loss,
            logits=logits,
            past_key_values=model_output.past_key_values,
            hidden_states=model_output.hidden_states,
            attentions=model_output.attentions,
        )

    def get_input_embeddings(self):
        """
        This method retrieves the input embeddings from the CpmBeeForCausalLM object.

        Args:
            self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.

        Returns:
            input_embedding: This method returns the input embeddings, which are of type None.

        Raises:
            None.
        """
        return self.cpmbee.input_embedding

    def set_input_embeddings(self, embeddings):
        """
        Sets the input embeddings for the CpmBeeForCausalLM class.

        Args:
            self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.
            embeddings: The input embeddings to be set for the CpmBeeForCausalLM instance.

        Returns:
            None.

        Raises:
            None.
        """
        self.cpmbee.input_embedding = embeddings

    def get_output_embeddings(self):
        """
        Returns the output embeddings for the CpmBeeForCausalLM model.

        Args:
            self: An instance of the CpmBeeForCausalLM class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Sets the output embeddings for the CpmBeeForCausalLM model.

        Args:
            self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.
            new_embeddings: The new embeddings to be set as the output embeddings.
                This should be a tensor or an object that can be converted to a tensor.

        Returns:
            None

        Raises:
            None

        This method sets the output embeddings of the CpmBeeForCausalLM model to the provided new embeddings.
        The new embeddings are assigned to the 'lm_head' attribute of the model object.
        """
        self.lm_head = new_embeddings

    def prepare_inputs_for_generation(
        self,
        input_ids: mindspore.Tensor,
        batch_size: int,
        beam_scorer: CpmBeeBeamSearchScorer = None,
        input_id_subs: Optional[mindspore.Tensor] = None,
        input_pos: Optional[mindspore.Tensor] = None,
        segment_ids: Optional[mindspore.Tensor] = None,
        batch_ext_table_ids: Optional[mindspore.Tensor] = None,
        batch_ext_table_sub: Optional[mindspore.Tensor] = None,
        other_info: Optional[Dict] = None,
        **model_kwargs,
    ):
        """
        Choose the current input according to beam states.
        """
        # init preparation
        context = model_kwargs.get("context")
        sample_ids = model_kwargs.get("sample_ids")
        segment_rel_offset = model_kwargs.get("segment_rel_offset")
        num_segments = model_kwargs.get("num_segments")
        segment_rel = model_kwargs.get("segment_rel")
        past_states = model_kwargs.get("past_states", None)
        past_key_values = model_kwargs.get("past_key_values", None)
        _input_ids = input_ids

        # update input in generation
        if beam_scorer is not None:
            tmp_input = []
            tmp_input_sub = []
            tmp_position = []
            tmp_segment = []
            for sent_id in range(batch_size):
                for beam_id in range(beam_scorer.num_beams):
                    tmp_input.append(beam_scorer.beam_states[sent_id][beam_id]["nx_token_id"])
                    tmp_input_sub.append(beam_scorer.beam_states[sent_id][beam_id]["nx_token_sub"])
                    tmp_position.append(beam_scorer.beam_states[sent_id][beam_id]["nx_position"])
                    tmp_segment.append(beam_scorer.beam_states[sent_id][beam_id]["nx_segment_id"])

            model_kwargs["input_id_subs"] = input_id_subs = mindspore.tensor(
                tmp_input_sub, dtype=mindspore.int64
            ).view(batch_size * beam_scorer.num_beams, 1)
            model_kwargs["input_pos"] = input_pos = mindspore.tensor(
                tmp_position, dtype=mindspore.int64
            ).view(batch_size * beam_scorer.num_beams, 1)
            model_kwargs["segment_ids"] = segment_ids = mindspore.tensor(
                tmp_segment, dtype=mindspore.int64
            ).view(batch_size * beam_scorer.num_beams, 1)
            input_ids = ops.cat(
                [
                    input_ids,
                    mindspore.tensor(tmp_input, dtype=mindspore.int64).view(
                        batch_size * beam_scorer.num_beams, 1
                    ),
                ],
                dim=-1,
            )
            _input_ids = input_ids[:, -1:]

        return {
            "input_ids": _input_ids,
            "input_id_sub": input_id_subs,
            "position": input_pos,
            "context": context,
            "sample_ids": sample_ids,
            "segment_rel_offset": segment_rel_offset,
            "segment": segment_ids,
            "num_segments": num_segments,
            "segment_rel": segment_rel,
            "use_cache": True,
            "past_key_values": past_key_values,
            "ext_table_ids": batch_ext_table_ids,
            "ext_table_sub": batch_ext_table_sub,
            "past_states": past_states,
        }, input_ids

    def _update_model_kwargs_for_generation(
        self,
        outputs: ModelOutput,
        model_inputs=None,
        **model_kwargs,
    ) -> Dict[str, Any]:
        """
        Concatenate the history input and current input.
        """
        old_past_states = model_kwargs["past_states"]
        model_kwargs["past_states"] = {
            "buffer_position": ops.cat([old_past_states["buffer_position"], model_inputs["position"]], dim=-1),
            "buffer_context": ops.cat([old_past_states["buffer_context"], model_inputs["context"].astype(mindspore.int64)], dim=-1),
            "buffer_sample_ids": ops.cat([old_past_states["buffer_sample_ids"], model_inputs["sample_ids"]], dim=-1),
            "buffer_num_segments": ops.cat(
                [old_past_states["buffer_num_segments"], model_inputs["num_segments"]], dim=-1
            ),
            "buffer_segments": ops.cat([old_past_states["buffer_segments"], model_inputs["segment"]], dim=-1),
            "buffer": outputs.past_key_values,
        }

        return model_kwargs

    def _reorder_cache(self, past_key_values: Dict, beam_idx: mindspore.Tensor):
        """
        Reorders the cache of past key values for beam search decoding in a CpmBeeForCausalLM object.

        Args:
            self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.
            past_key_values (Dict): The dictionary containing the cache of past key values.
                The cache is used during beam search decoding to store previous key-value pairs.
            beam_idx (mindspore.Tensor): The tensor containing the indices of the beams to be reordered.
                The indices represent the order in which the beams are to be arranged.

        Returns:
            None: The method modifies the past_key_values dictionary in-place.

        Raises:
            None.

        Note:
            The method reorders the cache by rearranging the key-value pairs based on the given beam indices.
            If the cache contains a 'buffer' key, the key-value pairs within the buffer are rearranged.
            If a key-value pair is (None, None), it remains unchanged.
            Otherwise, the key-value pair is split into separate key and value tensors, and only the tensors
            corresponding to the specified beam indices are kept in the cache.

        Example:
            ```python
            >>> # Create an instance of the CpmBeeForCausalLM class
            >>> cpm_bee = CpmBeeForCausalLM()
            ...
            >>> # Define the past key values
            >>> past_key_values = {
            >>>     'buffer': [((key1, value1), (key2, value2)), ((key3, value3), (key4, value4))],
            >>>     'other_key': tensor([[1, 2, 3], [4, 5, 6]])
            >>> }
            ...
            >>> # Define the beam indices
            >>> beam_idx = tensor([1, 0])
            ...
            >>> # Reorder the cache of past key values
            >>> cpm_bee._reorder_cache(past_key_values, beam_idx)
            ```
        """
        beam_idx = beam_idx.tolist()
        for kw in past_key_values.keys():
            if kw == "buffer":
                buf_list = past_key_values[kw]
                nw_buf_list = []
                for buf in buf_list:
                    if buf == (None, None):
                        nw_buf_list.append((None, None))
                    else:
                        k_buf, v_buf = buf
                        nw_buf_list.append((k_buf[beam_idx, :], v_buf[beam_idx, :]))
                past_key_values[kw] = nw_buf_list
            else:
                past_key_values[kw] = past_key_values[kw][beam_idx, :]

        return past_key_values

    @staticmethod
    def _expand_inputs_for_generation(
        expand_size: int = 1,
        is_encoder_decoder: bool = False,
        input_ids: Optional[mindspore.Tensor] = None,
        **model_kwargs,
    ) -> Tuple[mindspore.Tensor, Dict[str, Any]]:
        """Expands tensors from [batch_size, ...] to [batch_size * expand_size, ...]"""
        # do not expand ext_table_ids and ext_table_sub
        def _expand_dict_for_generation(dict_to_expand):
            for key in dict_to_expand:
                if (
                    dict_to_expand[key] is not None
                    and isinstance(dict_to_expand[key], mindspore.Tensor)
                    and "ext_table" not in key
                ):
                    dict_to_expand[key] = ops.repeat_interleave(dict_to_expand[key], expand_size, dim=0)
            return dict_to_expand

        if input_ids is not None:
            input_ids = ops.repeat_interleave(input_ids, expand_size, dim=0)

        model_kwargs = _expand_dict_for_generation(model_kwargs)

        if is_encoder_decoder:
            if model_kwargs.get("encoder_outputs") is None:
                raise ValueError("If `is_encoder_decoder` is True, make sure that `encoder_outputs` is defined.")
            model_kwargs["encoder_outputs"] = _expand_dict_for_generation(model_kwargs["encoder_outputs"])

        return input_ids, model_kwargs

    def adjust_logits_during_generation(
        self,
        logits: mindspore.Tensor,
        batch_size: int,
        beam_size: int,
        vocab_size: int,
        ext_table_ids: mindspore.Tensor,
        **model_kwargs,
    ) -> mindspore.Tensor:
        """
        Implement in subclasses of [`PreTrainedModel`] for custom behavior to adjust the logits in the generate method.
        """
        for sent_id in range(batch_size):
            if 1 not in model_kwargs["other_info"][sent_id]["ext_table"]:
                # unk is not allowed, mask unk
                logits[sent_id * beam_size : (sent_id + 1) * beam_size, 1] = -10000
            ext_ids = set()
            for v in model_kwargs["other_info"][sent_id]["ext_table"].keys():
                ext_ids.add(v)
            for ext_id in range(vocab_size, vocab_size + ext_table_ids.shape[0]):
                if ext_id not in ext_ids:
                    logits[sent_id * beam_size : (sent_id + 1) * beam_size, ext_id] = -10000
        return logits

    def beam_search(
        self,
        input_ids: mindspore.Tensor,
        beam_scorer: CpmBeeBeamSearchScorer,
        repetition_penalty: Optional[float] = 1.0,
        logits_processor: Optional[LogitsProcessorList] = None,
        max_length: Optional[int] = None,
        pad_token_id: Optional[int] = None,
        eos_token_id: Optional[Union[int, List[int]]] = None,
        bos_token_id: Optional[Union[int, List[int]]] = None,
        vocab_size: Optional[int] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_scores: Optional[bool] = None,
        return_dict_in_generate: Optional[bool] = None,
        synced_gpus: bool = False,
        **model_kwargs,
    ) -> List:
        """
        Override the beam_search for CPMBee.
        """
        # init values
        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
        pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id
        eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id
        bos_token_id = bos_token_id if bos_token_id is not None else self.generation_config.bos_token_id
        vocab_size = vocab_size if vocab_size is not None else self.generation_config.vocab_size
        max_length = max_length if max_length is not None else self.generation_config.max_new_tokens
        output_scores = output_scores if output_scores is not None else self.generation_config.output_scores
        output_attentions = (
            output_attentions if output_attentions is not None else self.generation_config.output_attentions
        )
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states
        )
        return_dict_in_generate = (
            return_dict_in_generate
            if return_dict_in_generate is not None
            else self.generation_config.return_dict_in_generate
        )

        batch_size = len(beam_scorer._beam_hyps)
        num_beams = beam_scorer.num_beams

        batch_beam_size, cur_len = input_ids.shape

        if num_beams * batch_size != batch_beam_size:
            raise ValueError(
                f"Batch dimension of `input_ids` should be {num_beams * batch_size}, but is {batch_beam_size}."
            )

        # init attention / hidden states / scores tuples
        scores = () if (return_dict_in_generate and output_scores) else None
        beam_indices = (
            tuple(() for _ in range(batch_beam_size)) if (return_dict_in_generate and output_scores) else None
        )
        decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
        cross_attentions = () if (return_dict_in_generate and output_attentions) else None
        decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None

        # initialise score of first beam with 0 and the rest with -1e9. This makes sure that only tokens
        # of the first beam are considered to avoid sampling the exact same tokens across all beams.
        beam_scores = ops.zeros((batch_size, num_beams), dtype=mindspore.float32)
        beam_scores[:, 1:] = -1e9
        beam_scores = beam_scores.view((batch_size * num_beams,))

        # init inference
        model_inputs, input_ids = self.prepare_inputs_for_generation(input_ids, batch_size, **model_kwargs)
        pred_start_index = input_ids.shape[-1]
        outputs = self.inference(
            **model_inputs,
            return_dict=True,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        # update model_kwargs
        model_kwargs["past_states"] = {
            "buffer_position": model_inputs["position"],
            "buffer_context": model_inputs["context"],
            "buffer_sample_ids": model_inputs["sample_ids"],
            "buffer_num_segments": model_inputs["num_segments"],
            "buffer_segments": model_inputs["segment"],
            "buffer": outputs.past_key_values,
        }
        model_kwargs["context"] = ops.ones(batch_beam_size, dtype=mindspore.bool_).view(
            batch_beam_size, 1
        )
        model_kwargs["sample_ids"] = ops.zeros(batch_beam_size, dtype=mindspore.int64).view(
            batch_beam_size, 1
        )
        model_kwargs["num_segments"] = model_kwargs["num_segments"][:, -1:]
        model_kwargs["segment_rel_offset"] = model_kwargs["segment_rel_offset"][:, -1:]
        model_kwargs["past_key_values"] = outputs.past_key_values

        ext_table_ids_cpu = model_inputs["ext_table_ids"]
        ext_table_sub_cpu = model_inputs["ext_table_sub"]

        cur_len = 0
        while True:
            model_inputs, input_ids = self.prepare_inputs_for_generation(
                input_ids, batch_size, beam_scorer, **model_kwargs
            )

            outputs = self.inference(
                **model_inputs,
                return_dict=True,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
            )

            next_token_logits = outputs.logits[:, -1, :]

            if all(beam_scorer._done):
                break
            # hack: adjust tokens for Marian. For Marian we have to make sure that the `pad_token_id`
            # cannot be generated both before and after the `ops.log_softmax` operation.
            next_token_logits = self.adjust_logits_during_generation(
                next_token_logits, batch_size, num_beams, vocab_size, ext_table_ids_cpu, **model_kwargs
            )

            # repetition_penalty
            beam_scorer.apply_repetition_penalty(
                next_token_logits,
                batch_size,
                num_beams,
                input_ids,
                repetition_penalty,
                pred_start_index,
                input_ids.shape[-1] - 1,
                None,
            )

            _next_token_scores = F.log_softmax(
                next_token_logits, dim=-1
            )  # (batch_size * num_beams, vocab_size)

            next_token_scores_processed = logits_processor(input_ids, _next_token_scores)
            # next_token_scores_processed = _next_token_scores
            next_token_scores = next_token_scores_processed + beam_scores[:, None].expand_as(_next_token_scores)

            # Store scores, attentions and hidden_states when required
            if return_dict_in_generate:
                if output_scores:
                    scores += (next_token_scores_processed,)
                if output_attentions:
                    decoder_attentions += (
                        (outputs.decoder_attentions,) if self.config.is_encoder_decoder else (outputs.attentions,)
                    )
                    if self.config.is_encoder_decoder:
                        cross_attentions += (outputs.cross_attentions,)

                if output_hidden_states:
                    decoder_hidden_states += (
                        (outputs.decoder_hidden_states,)
                        if self.config.is_encoder_decoder
                        else (outputs.hidden_states,)
                    )

            # reshape for beam search
            next_token_scores = next_token_scores.view(batch_size, -1)

            # Sample 2 next tokens for each beam (so we have some spare tokens and match output of beam search)
            next_token_scores, next_tokens = ops.topk(
                next_token_scores, 2 * num_beams, dim=1, largest=True, sorted=True
            )

            beam_outputs = beam_scorer.process(
                batch_size,
                cur_len,
                _next_token_scores,
                next_token_scores,
                next_tokens,
                vocab_size=vocab_size,
                pad_token_id=pad_token_id,
                bos_token_id=bos_token_id,
                eos_token_id=eos_token_id,
                max_length=max_length,
                ext_table_ids_cpu=ext_table_ids_cpu,
                ext_table_sub_cpu=ext_table_sub_cpu,
                **model_kwargs,
            )
            if beam_outputs is None:
                break
            beam_idx = beam_outputs["next_beam_indices"]
            beam_scores = beam_outputs["next_beam_scores"]

            input_ids = input_ids[beam_idx.tolist(), :]
            model_kwargs = self._update_model_kwargs_for_generation(outputs, model_inputs, **model_kwargs)
            if model_kwargs["past_states"] is not None:
                model_kwargs["past_states"] = self._reorder_cache(model_kwargs["past_states"], beam_idx)

            if return_dict_in_generate and output_scores:
                beam_indices = tuple((beam_indices[beam_idx[i]] + (beam_idx[i],) for i in range(len(beam_indices))))

            cur_len += 1

            if beam_scorer.is_done or cur_len == max_length + 1:
                if not synced_gpus:
                    break

        sequence_outputs = beam_scorer.finalize()

        return sequence_outputs

    def _generate(
        self,
        inputs: Optional[mindspore.Tensor] = None,
        generation_config: Optional[GenerationConfig] = None,
        repetition_penalty: Optional[float] = 1.0,
        logits_processor: Optional[LogitsProcessorList] = None,
        stopping_criteria: Optional[StoppingCriteriaList] = None,
        prefix_allowed_tokens_fn: Optional[Callable[[int, mindspore.Tensor], List[int]]] = None,
        synced_gpus: Optional[bool] = None,
        streamer: Optional["BaseStreamer"] = None,
        **kwargs,
    ) -> List:
        r"""
        The generation of CPMBee.

        1. It will use beam search as generation strategy.
        2. It will use CpmBeeBeamSearchScorer as the beamsearch scorer.
        """
        # 1. Handle `generation_config` and kwargs that might update it, and validate the `.generate()` call
        self._validate_model_class()

        # priority: `generation_config` argument > `model.generation_config` (the default generation config)
        if generation_config is None:
            # legacy: users may modify the model configuration to control generation -- update the generation config
            # model attribute accordingly, if it was created from the model config
            if self.generation_config._from_model_config:
                new_generation_config = GenerationConfig.from_model_config(self.config)
                if new_generation_config != self.generation_config:
                    warnings.warn(
                        "You have modified the pretrained model configuration to control generation. This is a"
                        " deprecated strategy to control generation."
                        " Please use a generation configuration file (see"
                        " https://hf-mirror.com/docs/transformers/main_classes/text_generation)"
                    )
                    self.generation_config = new_generation_config
            generation_config = self.generation_config

        generation_config = copy.deepcopy(generation_config)
        model_kwargs = generation_config.update(**kwargs)  # All unused kwargs must be model kwargs
        generation_config.validate()
        self._validate_model_kwargs(model_kwargs.copy())

        # 2. Set generation parameters if not already defined
        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()

        if generation_config.pad_token_id is None and generation_config.eos_token_id is not None:
            if model_kwargs.get("attention_mask", None) is None:
                logger.warning(
                    "The attention mask and the pad token id were not set. As a consequence, you may observe "
                    "unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results."
                )
            eos_token_id = generation_config.eos_token_id
            if isinstance(eos_token_id, list):
                eos_token_id = eos_token_id[0]
            logger.warning(f"Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation.")
            generation_config.pad_token_id = eos_token_id

        # 3. Define model inputs
        # inputs_tensor has to be defined
        # model_input_name is defined if model-specific keyword input is passed
        # otherwise model_input_name is None
        # all model-specific keyword inputs are removed from `model_kwargs`
        inputs_tensor, model_input_name, model_kwargs = self._prepare_model_inputs(
            inputs, generation_config.bos_token_id, model_kwargs
        )
        batch_size = inputs_tensor.shape[0]

        # 4. Define other model kwargs
        model_kwargs["output_attentions"] = generation_config.output_attentions
        model_kwargs["output_hidden_states"] = generation_config.output_hidden_states
        model_kwargs["use_cache"] = generation_config.use_cache

        accepts_attention_mask = "attention_mask" in set(inspect.signature(self.forward).parameters.keys())
        requires_attention_mask = "encoder_outputs" not in model_kwargs

        if model_kwargs.get("attention_mask", None) is None and requires_attention_mask and accepts_attention_mask:
            model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
                inputs_tensor, generation_config.pad_token_id, generation_config.eos_token_id
            )

        # decoder-only models should use left-padding for generation
        if not self.config.is_encoder_decoder:
            # If `input_ids` was given, check if the last id in any sequence is `pad_token_id`
            # Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off.
            if (
                generation_config.pad_token_id is not None
                and len(inputs_tensor.shape) == 2
                and ops.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0
            ):
                logger.warning(
                    "A decoder-only architecture is being used, but right-padding was detected! For correct "
                    "generation results, please set `padding_side='left'` when initializing the tokenizer."
                )

        # 5. Prepare `input_ids` which will be used for auto-regressive generation
        input_ids = inputs_tensor if model_input_name == "input_ids" else model_kwargs.pop("input_ids")

        if streamer is not None:
            streamer.put(input_ids)

        # 6. Prepare `max_length` depending on other stopping criteria.
        input_ids_seq_length = input_ids.shape[-1]
        has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
        if has_default_max_length and generation_config.max_new_tokens is None:
            warnings.warn(
                f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
                "This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
                " recommend using `max_new_tokens` to control the maximum length of the generation.",
                UserWarning,
            )
        elif generation_config.max_new_tokens is not None:
            if not has_default_max_length:
                logger.warning(
                    f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
                    f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
                    "Please refer to the documentation for more information. "
                    "(https://hf-mirror.com/docs/transformers/main/en/main_classes/text_generation)"
                )
            generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length

        if generation_config.min_length is not None and generation_config.min_length > generation_config.max_length:
            raise ValueError(
                f"Unfeasible length constraints: the minimum length ({generation_config.min_length}) is larger than"
                f" the maximum length ({generation_config.max_length})"
            )
        if input_ids_seq_length >= generation_config.max_length:
            input_ids_string = "decoder_input_ids" if self.config.is_encoder_decoder else "input_ids"
            logger.warning(
                f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
                f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
                " increasing `max_new_tokens`."
            )

        if streamer is not None and (generation_config.num_beams > 1):
            raise ValueError(
                "`streamer` cannot be used with beam search (yet!). Make sure that `num_beams` is set to 1."
            )

        # 7. prepare distribution pre_processing samplers
        logits_processor = self._get_logits_processor(
            generation_config=generation_config,
            input_ids_seq_length=input_ids_seq_length,
            encoder_input_ids=inputs_tensor,
            prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
            logits_processor=logits_processor,
        )

        # 8. prepare beam search scorer
        beam_scorer = CpmBeeBeamSearchScorer(
            batch_size=batch_size,
            num_beams=generation_config.num_beams,
            length_penalty=generation_config.length_penalty,
            do_early_stopping=generation_config.early_stopping,
            num_beam_hyps_to_keep=generation_config.num_return_sequences,
            max_length=generation_config.max_new_tokens,
            **kwargs,
        )
        # 9. interleave input_ids with `num_beams` additional sequences per batch
        input_ids, model_kwargs = self._expand_inputs_for_generation(
            input_ids=input_ids,
            expand_size=generation_config.num_beams,
            is_encoder_decoder=self.config.is_encoder_decoder,
            **model_kwargs,
        )
        # 10. run beam search
        return self.beam_search(
            input_ids,
            beam_scorer,
            repetition_penalty=repetition_penalty,
            logits_processor=logits_processor,
            max_length=generation_config.max_new_tokens,
            pad_token_id=generation_config.pad_token_id,
            eos_token_id=generation_config.eos_token_id,
            vocab_size=kwargs.get("vocab_size", None),
            output_scores=generation_config.output_scores,
            return_dict_in_generate=generation_config.return_dict_in_generate,
            synced_gpus=synced_gpus,
            **model_kwargs,
        )

    def generate(
        self,
        data_list: Union[Dict, List[Dict]],
        tokenizer: CpmBeeTokenizer,
        **kwargs,
    ):
        """
        Override the generate for CPMBee. It will accept dict or list(dict) as input and returns dict or list(dict)
        with `<ans>` filled.

        Parameters:
            data_list (`dict` or `list(dict)`):
                The sequence used as a prompt for the generation or as model inputs to the encoder. If dict, data_list
                will be wrapped as a list.
            tokenizer: (`CpmBeeTokenizer`):
                The tokenizer.
            generation_config (`~generation.GenerationConfig`, *optional*):
                The generation configuration to be used as base parametrization for the generation call. `**kwargs`
                passed to generate matching the attributes of `generation_config` will override them. If
                `generation_config` is not provided, the default will be used, which had the following loading
                priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
                configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s
                default values, whose documentation should be checked to parameterize generation.
        """
        if isinstance(data_list, dict):
            data_list = [data_list]
        input_encoded = tokenizer(data_list, return_tensors="ms", padding=True)
        input_encoded.update(kwargs)
        input_encoded["vocab_size"] = tokenizer.vocab_size

        decode_res = self._generate(**input_encoded)

        for sent_id, result in enumerate(decode_res):
            ans_result_map: Dict[int, List[int]] = {}
            for raw_word_id, ans_id in result:
                if ans_id not in ans_result_map:
                    ans_result_map[ans_id] = []
                ans_result_map[ans_id].append(raw_word_id)

            answer_placeholders = input_encoded["other_info"][sent_id]["answer_placeholders"]
            ext_table = input_encoded["other_info"][sent_id]["ext_table"]
            data = data_list[sent_id]
            for ans_id, token_ids in ans_result_map.items():
                if token_ids[-1] == tokenizer.eos_token_id:
                    token_ids = token_ids[:-1]
                text = tokenizer.decode(token_ids, ext_table)
                path = answer_placeholders[ans_id - 1]

                if len(path) > 0:
                    p = data["<ans>"]
                    for part in path[:-1]:
                        p = p[part]
                    p[path[-1]] = text
                else:
                    data["<ans>"] = text
            for ans_id in range(len(answer_placeholders)):
                if (ans_id + 1) not in ans_result_map:
                    path = answer_placeholders[ans_id]
                    p = data["<ans>"]
                    for part in path[:-1]:
                        p = p[part]
                    p[path[-1]] = None
        return data_list

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.init(config)` ¶

Initializes a new instance of the CpmBeeForCausalLM class.

PARAMETER	DESCRIPTION
`self`	The object instance.
`config`	The configuration object for the CpmBee model. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes a new instance of the CpmBeeForCausalLM class.

    Args:
        self: The object instance.
        config (CpmBeeConfig): The configuration object for the CpmBee model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.cpmbee = CpmBeeModel(config)

    # lm_head.weight is tied to cpmbee.input_embedding.weight
    self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
    self.post_init()

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.adjust_logits_during_generation(logits, batch_size, beam_size, vocab_size, ext_table_ids, **model_kwargs)` ¶

Implement in subclasses of [PreTrainedModel] for custom behavior to adjust the logits in the generate method.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def adjust_logits_during_generation(
    self,
    logits: mindspore.Tensor,
    batch_size: int,
    beam_size: int,
    vocab_size: int,
    ext_table_ids: mindspore.Tensor,
    **model_kwargs,
) -> mindspore.Tensor:
    """
    Implement in subclasses of [`PreTrainedModel`] for custom behavior to adjust the logits in the generate method.
    """
    for sent_id in range(batch_size):
        if 1 not in model_kwargs["other_info"][sent_id]["ext_table"]:
            # unk is not allowed, mask unk
            logits[sent_id * beam_size : (sent_id + 1) * beam_size, 1] = -10000
        ext_ids = set()
        for v in model_kwargs["other_info"][sent_id]["ext_table"].keys():
            ext_ids.add(v)
        for ext_id in range(vocab_size, vocab_size + ext_table_ids.shape[0]):
            if ext_id not in ext_ids:
                logits[sent_id * beam_size : (sent_id + 1) * beam_size, ext_id] = -10000
    return logits

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.beam_search(input_ids, beam_scorer, repetition_penalty=1.0, logits_processor=None, max_length=None, pad_token_id=None, eos_token_id=None, bos_token_id=None, vocab_size=None, output_attentions=None, output_hidden_states=None, output_scores=None, return_dict_in_generate=None, synced_gpus=False, **model_kwargs)` ¶

Override the beam_search for CPMBee.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def beam_search(
    self,
    input_ids: mindspore.Tensor,
    beam_scorer: CpmBeeBeamSearchScorer,
    repetition_penalty: Optional[float] = 1.0,
    logits_processor: Optional[LogitsProcessorList] = None,
    max_length: Optional[int] = None,
    pad_token_id: Optional[int] = None,
    eos_token_id: Optional[Union[int, List[int]]] = None,
    bos_token_id: Optional[Union[int, List[int]]] = None,
    vocab_size: Optional[int] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_scores: Optional[bool] = None,
    return_dict_in_generate: Optional[bool] = None,
    synced_gpus: bool = False,
    **model_kwargs,
) -> List:
    """
    Override the beam_search for CPMBee.
    """
    # init values
    logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
    pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id
    eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id
    bos_token_id = bos_token_id if bos_token_id is not None else self.generation_config.bos_token_id
    vocab_size = vocab_size if vocab_size is not None else self.generation_config.vocab_size
    max_length = max_length if max_length is not None else self.generation_config.max_new_tokens
    output_scores = output_scores if output_scores is not None else self.generation_config.output_scores
    output_attentions = (
        output_attentions if output_attentions is not None else self.generation_config.output_attentions
    )
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states
    )
    return_dict_in_generate = (
        return_dict_in_generate
        if return_dict_in_generate is not None
        else self.generation_config.return_dict_in_generate
    )

    batch_size = len(beam_scorer._beam_hyps)
    num_beams = beam_scorer.num_beams

    batch_beam_size, cur_len = input_ids.shape

    if num_beams * batch_size != batch_beam_size:
        raise ValueError(
            f"Batch dimension of `input_ids` should be {num_beams * batch_size}, but is {batch_beam_size}."
        )

    # init attention / hidden states / scores tuples
    scores = () if (return_dict_in_generate and output_scores) else None
    beam_indices = (
        tuple(() for _ in range(batch_beam_size)) if (return_dict_in_generate and output_scores) else None
    )
    decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
    cross_attentions = () if (return_dict_in_generate and output_attentions) else None
    decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None

    # initialise score of first beam with 0 and the rest with -1e9. This makes sure that only tokens
    # of the first beam are considered to avoid sampling the exact same tokens across all beams.
    beam_scores = ops.zeros((batch_size, num_beams), dtype=mindspore.float32)
    beam_scores[:, 1:] = -1e9
    beam_scores = beam_scores.view((batch_size * num_beams,))

    # init inference
    model_inputs, input_ids = self.prepare_inputs_for_generation(input_ids, batch_size, **model_kwargs)
    pred_start_index = input_ids.shape[-1]
    outputs = self.inference(
        **model_inputs,
        return_dict=True,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    # update model_kwargs
    model_kwargs["past_states"] = {
        "buffer_position": model_inputs["position"],
        "buffer_context": model_inputs["context"],
        "buffer_sample_ids": model_inputs["sample_ids"],
        "buffer_num_segments": model_inputs["num_segments"],
        "buffer_segments": model_inputs["segment"],
        "buffer": outputs.past_key_values,
    }
    model_kwargs["context"] = ops.ones(batch_beam_size, dtype=mindspore.bool_).view(
        batch_beam_size, 1
    )
    model_kwargs["sample_ids"] = ops.zeros(batch_beam_size, dtype=mindspore.int64).view(
        batch_beam_size, 1
    )
    model_kwargs["num_segments"] = model_kwargs["num_segments"][:, -1:]
    model_kwargs["segment_rel_offset"] = model_kwargs["segment_rel_offset"][:, -1:]
    model_kwargs["past_key_values"] = outputs.past_key_values

    ext_table_ids_cpu = model_inputs["ext_table_ids"]
    ext_table_sub_cpu = model_inputs["ext_table_sub"]

    cur_len = 0
    while True:
        model_inputs, input_ids = self.prepare_inputs_for_generation(
            input_ids, batch_size, beam_scorer, **model_kwargs
        )

        outputs = self.inference(
            **model_inputs,
            return_dict=True,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )

        next_token_logits = outputs.logits[:, -1, :]

        if all(beam_scorer._done):
            break
        # hack: adjust tokens for Marian. For Marian we have to make sure that the `pad_token_id`
        # cannot be generated both before and after the `ops.log_softmax` operation.
        next_token_logits = self.adjust_logits_during_generation(
            next_token_logits, batch_size, num_beams, vocab_size, ext_table_ids_cpu, **model_kwargs
        )

        # repetition_penalty
        beam_scorer.apply_repetition_penalty(
            next_token_logits,
            batch_size,
            num_beams,
            input_ids,
            repetition_penalty,
            pred_start_index,
            input_ids.shape[-1] - 1,
            None,
        )

        _next_token_scores = F.log_softmax(
            next_token_logits, dim=-1
        )  # (batch_size * num_beams, vocab_size)

        next_token_scores_processed = logits_processor(input_ids, _next_token_scores)
        # next_token_scores_processed = _next_token_scores
        next_token_scores = next_token_scores_processed + beam_scores[:, None].expand_as(_next_token_scores)

        # Store scores, attentions and hidden_states when required
        if return_dict_in_generate:
            if output_scores:
                scores += (next_token_scores_processed,)
            if output_attentions:
                decoder_attentions += (
                    (outputs.decoder_attentions,) if self.config.is_encoder_decoder else (outputs.attentions,)
                )
                if self.config.is_encoder_decoder:
                    cross_attentions += (outputs.cross_attentions,)

            if output_hidden_states:
                decoder_hidden_states += (
                    (outputs.decoder_hidden_states,)
                    if self.config.is_encoder_decoder
                    else (outputs.hidden_states,)
                )

        # reshape for beam search
        next_token_scores = next_token_scores.view(batch_size, -1)

        # Sample 2 next tokens for each beam (so we have some spare tokens and match output of beam search)
        next_token_scores, next_tokens = ops.topk(
            next_token_scores, 2 * num_beams, dim=1, largest=True, sorted=True
        )

        beam_outputs = beam_scorer.process(
            batch_size,
            cur_len,
            _next_token_scores,
            next_token_scores,
            next_tokens,
            vocab_size=vocab_size,
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            max_length=max_length,
            ext_table_ids_cpu=ext_table_ids_cpu,
            ext_table_sub_cpu=ext_table_sub_cpu,
            **model_kwargs,
        )
        if beam_outputs is None:
            break
        beam_idx = beam_outputs["next_beam_indices"]
        beam_scores = beam_outputs["next_beam_scores"]

        input_ids = input_ids[beam_idx.tolist(), :]
        model_kwargs = self._update_model_kwargs_for_generation(outputs, model_inputs, **model_kwargs)
        if model_kwargs["past_states"] is not None:
            model_kwargs["past_states"] = self._reorder_cache(model_kwargs["past_states"], beam_idx)

        if return_dict_in_generate and output_scores:
            beam_indices = tuple((beam_indices[beam_idx[i]] + (beam_idx[i],) for i in range(len(beam_indices))))

        cur_len += 1

        if beam_scorer.is_done or cur_len == max_length + 1:
            if not synced_gpus:
                break

    sequence_outputs = beam_scorer.finalize()

    return sequence_outputs

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.forward(input_ids=None, input_id_sub=None, length=None, context=None, sample_ids=None, num_segments=None, segment=None, segment_rel_offset=None, segment_rel=None, span=None, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None, labels=None, return_dict=None, ext_table_ids=None, ext_table_sub=None, **kwargs)` ¶

PARAMETER	DESCRIPTION
`input_ids`	Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`CPMBeeTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. What are input IDs? TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`input_id_sub`	Subscription of input sequence tokens in the vocabulary. Subscription of normal text will be zero while the special tokens of each group will be the 0, 1, 2, ... , , ... belongs to group . , , ... belongs to group . TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`length`	The length of sequences in batch. TYPE: `mindspore.Tensor` of shape `(batch_size)` DEFAULT: `None`
`context`	Whether this token id is context or not. If is context, the value is 1. If not, the value is 0. If a token id is context, it does not need to be predicted. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`sample_ids`	Give a sample id to every token id. The token ids with same sample ids belongs to the same sample. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`num_segments`	Total number of segments in the current input. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`segment`	Give a segment id to every token id. The token ids with same segment ids belongs to the same sample. Generally, a string key or value in input data will be a segment. For example, input {"input": "hello, ", "": ""}, the segments includes: "input", "hello, ", "" and "". TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`segment_rel_offset`	The offset of segment rel. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`segment_rel`	The segment relevance. A relative implementation of measuring the importance of segments. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`span`	Span will record every input_ids shape. TYPE: `Dict[str, Union[mindspore.Tensor, List]]` DEFAULT: `None`
`output_attentions`	Whether or not to return the attentions tensors of all attention layers. TYPE: `bool`, optional DEFAULT: `None`
`output_hidden_states`	Whether or not to return the hidden states of all layers. TYPE: `bool`, optional DEFAULT: `None`
`past_key_values`	A dummy arguments for CPMBee. The `past_states` contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see `past_key_values` input) and other history arguments to speed up sequential decoding. TYPE: `tuple(tuple(mindspore.Tensor))`, optional, returned when `use_cache=True` is passed or when `config.use_cache=True` DEFAULT: `None`
`use_cache`	If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). TYPE: `bool`, optional DEFAULT: `None`
`labels`	Labels for computing the masked language modeling loss. TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, optional DEFAULT: `None`
`return_dict`	Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. TYPE: `bool`, optional DEFAULT: `None`
`ext_table_ids`	ext_table ids for embedding projection. TYPE: `mindspore.Tensor`, optional DEFAULT: `None`
`ext_table_sub`	ext_table subscriptions for embedding projection. TYPE: `mindspore.Tensor`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    input_id_sub: Optional[mindspore.Tensor] = None,
    length: Optional[mindspore.Tensor] = None,
    context: Optional[mindspore.Tensor] = None,
    sample_ids: Optional[mindspore.Tensor] = None,
    num_segments: Optional[mindspore.Tensor] = None,
    segment: Optional[mindspore.Tensor] = None,
    segment_rel_offset: Optional[mindspore.Tensor] = None,
    segment_rel: Optional[mindspore.Tensor] = None,
    span: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[List] = None,
    use_cache: Optional[bool] = None,
    labels: Optional[mindspore.Tensor] = None,
    return_dict: Optional[bool] = None,
    ext_table_ids: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
    ext_table_sub: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
    **kwargs,
) -> Union[Tuple, CausalLMOutputWithPast]:
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using [`CPMBeeTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        input_id_sub (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Subscription of input sequence tokens in the vocabulary.

            Subscription of normal text will be zero while the special tokens of each group will be the 0, 1, 2,
            ... <ans_0>, <ans_1>, <ans_2> ... belongs to group <ans>. <mask_0>, <mask_1>, <mask_2> ... belongs to
            group <mask>.
        length (`mindspore.Tensor` of shape `(batch_size)`):
            The length of sequences in batch.
        context (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Whether this token id is context or not. If is context, the value is 1. If not, the value is 0. If a
            token id is context, it does not need to be predicted.
        sample_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Give a sample id to every token id. The token ids with same sample ids belongs to the same sample.
        num_segments (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Total number of segments in the current input.
        segment (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Give a segment id to every token id. The token ids with same segment ids belongs to the same sample.

            Generally, a string key or value in input data will be a segment. For example, input {"input": "hello,
            ", "<ans>": ""}, the segments includes: "input", "hello, ", "<ans>" and "".
        segment_rel_offset (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            The offset of segment rel.
        segment_rel (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            The segment relevance. A relative implementation of measuring the importance of segments.
        span (`Dict[str, Union[mindspore.Tensor, List]]`):
            Span will record every input_ids shape.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
            A dummy arguments for CPMBee. The `past_states` contains pre-computed hidden-states (key and values in
            the self-attention blocks and in the cross-attention blocks) that can be used (see `past_key_values`
            input) and other history arguments to speed up sequential decoding.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
        ext_table_ids (`mindspore.Tensor`, *optional*):
            ext_table ids for embedding projection.
        ext_table_sub (`mindspore.Tensor`, *optional*):
            ext_table subscriptions for embedding projection.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    model_output = self.cpmbee(
        input_ids,
        input_id_sub,
        length,
        context,
        sample_ids,
        num_segments,
        segment,
        segment_rel_offset,
        segment_rel,
        span,
        output_attentions,
        output_hidden_states,
        past_key_values,
        use_cache,
        return_dict,
    )
    hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

    if ext_table_ids is not None:
        ext_table = self.cpmbee.input_embedding(ext_table_ids, ext_table_sub)
    else:
        ext_table = None
    logits = self.cpmbee.input_embedding.projection(hidden_states, ext_table)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(logits.view(-1, logits.shape[-1]), labels.long().view(-1))

    if not return_dict:
        output = (logits,) + model_output[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=logits,
        past_key_values=model_output.past_key_values,
        hidden_states=model_output.hidden_states,
        attentions=model_output.attentions,
    )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.generate(data_list, tokenizer, **kwargs)` ¶

Override the generate for CPMBee. It will accept dict or list(dict) as input and returns dict or list(dict) with <ans> filled.

PARAMETER	DESCRIPTION
`data_list`	The sequence used as a prompt for the generation or as model inputs to the encoder. If dict, data_list will be wrapped as a list. TYPE: `dict` or `list(dict)`
`tokenizer`	(`CpmBeeTokenizer`): The tokenizer. TYPE: `CpmBeeTokenizer`
`generation_config`	The generation configuration to be used as base parametrization for the generation call. `kwargs` passed to generate matching the attributes of `generation_config` will override them. If `generation_config` is not provided, the default will be used, which had the following loading priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s default values, whose documentation should be checked to parameterize generation. TYPE:** `~generation.GenerationConfig`, optional

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def generate(
    self,
    data_list: Union[Dict, List[Dict]],
    tokenizer: CpmBeeTokenizer,
    **kwargs,
):
    """
    Override the generate for CPMBee. It will accept dict or list(dict) as input and returns dict or list(dict)
    with `<ans>` filled.

    Parameters:
        data_list (`dict` or `list(dict)`):
            The sequence used as a prompt for the generation or as model inputs to the encoder. If dict, data_list
            will be wrapped as a list.
        tokenizer: (`CpmBeeTokenizer`):
            The tokenizer.
        generation_config (`~generation.GenerationConfig`, *optional*):
            The generation configuration to be used as base parametrization for the generation call. `**kwargs`
            passed to generate matching the attributes of `generation_config` will override them. If
            `generation_config` is not provided, the default will be used, which had the following loading
            priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
            configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s
            default values, whose documentation should be checked to parameterize generation.
    """
    if isinstance(data_list, dict):
        data_list = [data_list]
    input_encoded = tokenizer(data_list, return_tensors="ms", padding=True)
    input_encoded.update(kwargs)
    input_encoded["vocab_size"] = tokenizer.vocab_size

    decode_res = self._generate(**input_encoded)

    for sent_id, result in enumerate(decode_res):
        ans_result_map: Dict[int, List[int]] = {}
        for raw_word_id, ans_id in result:
            if ans_id not in ans_result_map:
                ans_result_map[ans_id] = []
            ans_result_map[ans_id].append(raw_word_id)

        answer_placeholders = input_encoded["other_info"][sent_id]["answer_placeholders"]
        ext_table = input_encoded["other_info"][sent_id]["ext_table"]
        data = data_list[sent_id]
        for ans_id, token_ids in ans_result_map.items():
            if token_ids[-1] == tokenizer.eos_token_id:
                token_ids = token_ids[:-1]
            text = tokenizer.decode(token_ids, ext_table)
            path = answer_placeholders[ans_id - 1]

            if len(path) > 0:
                p = data["<ans>"]
                for part in path[:-1]:
                    p = p[part]
                p[path[-1]] = text
            else:
                data["<ans>"] = text
        for ans_id in range(len(answer_placeholders)):
            if (ans_id + 1) not in ans_result_map:
                path = answer_placeholders[ans_id]
                p = data["<ans>"]
                for part in path[:-1]:
                    p = p[part]
                p[path[-1]] = None
    return data_list

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.get_input_embeddings()` ¶

This method retrieves the input embeddings from the CpmBeeForCausalLM object.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeForCausalLM class. TYPE: `CpmBeeForCausalLM`

RETURNS	DESCRIPTION
`input_embedding`	This method returns the input embeddings, which are of type None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def get_input_embeddings(self):
    """
    This method retrieves the input embeddings from the CpmBeeForCausalLM object.

    Args:
        self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.

    Returns:
        input_embedding: This method returns the input embeddings, which are of type None.

    Raises:
        None.
    """
    return self.cpmbee.input_embedding

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.get_output_embeddings()` ¶

Returns the output embeddings for the CpmBeeForCausalLM model.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeForCausalLM class.

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def get_output_embeddings(self):
    """
    Returns the output embeddings for the CpmBeeForCausalLM model.

    Args:
        self: An instance of the CpmBeeForCausalLM class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.lm_head

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.inference(input_ids=None, input_id_sub=None, position=None, context=None, sample_ids=None, num_segments=None, segment=None, segment_rel_offset=None, segment_rel=None, past_states=None, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None, labels=None, return_dict=None, ext_table_ids=None, ext_table_sub=None, **kwargs)` ¶

PARAMETER	DESCRIPTION
`input_ids`	Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`CPMBeeTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. What are input IDs? TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`input_id_sub`	Subscription of input sequence tokens in the vocabulary. Subscription of normal text will be zero while the special tokens of each group will be the 0, 1, 2, ... , , ... belongs to group . , , ... belongs to group . TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`position`	The position of input sequence tokens in the vocabulary for each segment. if segment1 is 0, 1, 2 and segment2 is 0, 1, 2, 3, the position will be 0, 1, 2, 0, 1, 2, 3 TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`context`	Whether this token id is context or not. If is context, the value is 1. If not, the value is 0. If a token id is context, it does not need to be predicted. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`sample_ids`	Give a sample id to every token id. The token ids with same sample ids belongs to the same sample. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`num_segments`	Total number of segments in the current input. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`segment`	Give a segment id to every token id. The token ids with same segment ids belongs to the same sample. Generally, a string key or value in input data will be a segment. For example, input {"input": "hello, ", "": ""}, the segments includes: "input", "hello, ", "" and "". TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`segment_rel_offset`	The offset of segment rel. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`segment_rel`	The segment relevance. A relative implementation of measuring the importance of segments. TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: `None`
`past_states`	Store the history information including position, context, sample_ids, num_segments, segment and past_key_values. TYPE: `Dict[str, Union[mindspore.Tensor, List]]` DEFAULT: `None`
`output_attentions`	Whether or not to return the attentions tensors of all attention layers. TYPE: `bool`, optional DEFAULT: `None`
`output_hidden_states`	Whether or not to return the hidden states of all layers. TYPE: `bool`, optional DEFAULT: `None`
`past_key_values`	A dummy arguments for CPMBee. The `past_states` contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see `past_key_values` input) and other history arguments to speed up sequential decoding. TYPE: `tuple(tuple(mindspore.Tensor))`, optional, returned when `use_cache=True` is passed or when `config.use_cache=True` DEFAULT: `None`
`use_cache`	If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). TYPE: `bool`, optional DEFAULT: `None`
`labels`	Labels for computing the masked language modeling loss. TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, optional DEFAULT: `None`
`return_dict`	Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. TYPE: `bool`, optional DEFAULT: `None`
`ext_table_ids`	ext_table ids for embedding projection. TYPE: `mindspore.Tensor`, optional DEFAULT: `None`
`ext_table_sub`	ext_table subscriptions for embedding projection. TYPE: `mindspore.Tensor`, optional DEFAULT: `None`

Example

Text Generation with CpmBeeForCausalLM.

>>> from transformers import CpmBeeTokenizer, CpmBeeForCausalLM
...
>>> texts = {"input": "今天天气不错，", "<ans>": ""}
>>> model = CpmBeeForCausalLM.from_pretrained("openbmb/cpm-bee-10b")
>>> tokenizer = CPMBeeTokenizer.from_pretrained("openbmb/cpm-bee-10b")
>>> output_texts = model.generate({"input": "今天天气不错，", "<ans>": ""}, tokenizer)
>>> print(output_texts)
{'input': '今天天气不错，', '<ans>': '适合睡觉。'}

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def inference(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    input_id_sub: Optional[mindspore.Tensor] = None,
    position: Optional[mindspore.Tensor] = None,
    context: Optional[mindspore.Tensor] = None,
    sample_ids: Optional[mindspore.Tensor] = None,
    num_segments: Optional[mindspore.Tensor] = None,
    segment: Optional[mindspore.Tensor] = None,
    segment_rel_offset: Optional[mindspore.Tensor] = None,
    segment_rel: Optional[mindspore.Tensor] = None,
    past_states: Optional[Dict] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[List] = None,
    use_cache: Optional[bool] = None,
    labels: Optional[mindspore.Tensor] = None,
    return_dict: Optional[bool] = None,
    ext_table_ids: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
    ext_table_sub: Optional[mindspore.Tensor] = None,  # (ext_table_size) int32
    **kwargs,
) -> Union[Tuple, CausalLMOutputWithPast]:
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using [`CPMBeeTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        input_id_sub (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Subscription of input sequence tokens in the vocabulary.

            Subscription of normal text will be zero while the special tokens of each group will be the 0, 1, 2,
            ... <ans_0>, <ans_1>, <ans_2> ... belongs to group <ans>. <mask_0>, <mask_1>, <mask_2> ... belongs to
            group <mask>.
        position (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            The position of input sequence tokens in the vocabulary for each segment. if segment1 is 0, 1, 2 and
            segment2 is 0, 1, 2, 3, the position will be 0, 1, 2, 0, 1, 2, 3
        context (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Whether this token id is context or not. If is context, the value is 1. If not, the value is 0. If a
            token id is context, it does not need to be predicted.
        sample_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Give a sample id to every token id. The token ids with same sample ids belongs to the same sample.
        num_segments (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Total number of segments in the current input.
        segment (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Give a segment id to every token id. The token ids with same segment ids belongs to the same sample.

            Generally, a string key or value in input data will be a segment. For example, input {"input": "hello,
            ", "<ans>": ""}, the segments includes: "input", "hello, ", "<ans>" and "".
        segment_rel_offset (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            The offset of segment rel.
        segment_rel (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            The segment relevance. A relative implementation of measuring the importance of segments.
        past_states (`Dict[str, Union[mindspore.Tensor, List]]`):
            Store the history information including position, context, sample_ids, num_segments, segment and
            past_key_values.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
            A dummy arguments for CPMBee. The `past_states` contains pre-computed hidden-states (key and values in
            the self-attention blocks and in the cross-attention blocks) that can be used (see `past_key_values`
            input) and other history arguments to speed up sequential decoding.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
        ext_table_ids (`mindspore.Tensor`, *optional*):
            ext_table ids for embedding projection.
        ext_table_sub (`mindspore.Tensor`, *optional*):
            ext_table subscriptions for embedding projection.

    Example:
        Text Generation with CpmBeeForCausalLM.
        ```python
        >>> from transformers import CpmBeeTokenizer, CpmBeeForCausalLM
        ...
        >>> texts = {"input": "今天天气不错，", "<ans>": ""}
        >>> model = CpmBeeForCausalLM.from_pretrained("openbmb/cpm-bee-10b")
        >>> tokenizer = CPMBeeTokenizer.from_pretrained("openbmb/cpm-bee-10b")
        >>> output_texts = model.generate({"input": "今天天气不错，", "<ans>": ""}, tokenizer)
        >>> print(output_texts)
        {'input': '今天天气不错，', '<ans>': '适合睡觉。'}
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    model_output = self.cpmbee.inference(
        input_ids,
        input_id_sub,
        position,
        context,
        sample_ids,
        num_segments,
        segment,
        segment_rel_offset,
        segment_rel,
        past_states,
        output_attentions,
        output_hidden_states,
        past_key_values,
        use_cache,
        return_dict,
    )
    hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

    if ext_table_ids is not None and 0 not in ext_table_ids.shape:
        ext_table = self.cpmbee.input_embedding(ext_table_ids, ext_table_sub)
    else:
        ext_table = None
    logits = self.cpmbee.input_embedding.projection(hidden_states, ext_table)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(logits.view(-1, logits.shape[-1]), labels.view(-1))

    if not return_dict:
        output = (logits,) + model_output[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=logits,
        past_key_values=model_output.past_key_values,
        hidden_states=model_output.hidden_states,
        attentions=model_output.attentions,
    )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.prepare_inputs_for_generation(input_ids, batch_size, beam_scorer=None, input_id_subs=None, input_pos=None, segment_ids=None, batch_ext_table_ids=None, batch_ext_table_sub=None, other_info=None, **model_kwargs)` ¶

Choose the current input according to beam states.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def prepare_inputs_for_generation(
    self,
    input_ids: mindspore.Tensor,
    batch_size: int,
    beam_scorer: CpmBeeBeamSearchScorer = None,
    input_id_subs: Optional[mindspore.Tensor] = None,
    input_pos: Optional[mindspore.Tensor] = None,
    segment_ids: Optional[mindspore.Tensor] = None,
    batch_ext_table_ids: Optional[mindspore.Tensor] = None,
    batch_ext_table_sub: Optional[mindspore.Tensor] = None,
    other_info: Optional[Dict] = None,
    **model_kwargs,
):
    """
    Choose the current input according to beam states.
    """
    # init preparation
    context = model_kwargs.get("context")
    sample_ids = model_kwargs.get("sample_ids")
    segment_rel_offset = model_kwargs.get("segment_rel_offset")
    num_segments = model_kwargs.get("num_segments")
    segment_rel = model_kwargs.get("segment_rel")
    past_states = model_kwargs.get("past_states", None)
    past_key_values = model_kwargs.get("past_key_values", None)
    _input_ids = input_ids

    # update input in generation
    if beam_scorer is not None:
        tmp_input = []
        tmp_input_sub = []
        tmp_position = []
        tmp_segment = []
        for sent_id in range(batch_size):
            for beam_id in range(beam_scorer.num_beams):
                tmp_input.append(beam_scorer.beam_states[sent_id][beam_id]["nx_token_id"])
                tmp_input_sub.append(beam_scorer.beam_states[sent_id][beam_id]["nx_token_sub"])
                tmp_position.append(beam_scorer.beam_states[sent_id][beam_id]["nx_position"])
                tmp_segment.append(beam_scorer.beam_states[sent_id][beam_id]["nx_segment_id"])

        model_kwargs["input_id_subs"] = input_id_subs = mindspore.tensor(
            tmp_input_sub, dtype=mindspore.int64
        ).view(batch_size * beam_scorer.num_beams, 1)
        model_kwargs["input_pos"] = input_pos = mindspore.tensor(
            tmp_position, dtype=mindspore.int64
        ).view(batch_size * beam_scorer.num_beams, 1)
        model_kwargs["segment_ids"] = segment_ids = mindspore.tensor(
            tmp_segment, dtype=mindspore.int64
        ).view(batch_size * beam_scorer.num_beams, 1)
        input_ids = ops.cat(
            [
                input_ids,
                mindspore.tensor(tmp_input, dtype=mindspore.int64).view(
                    batch_size * beam_scorer.num_beams, 1
                ),
            ],
            dim=-1,
        )
        _input_ids = input_ids[:, -1:]

    return {
        "input_ids": _input_ids,
        "input_id_sub": input_id_subs,
        "position": input_pos,
        "context": context,
        "sample_ids": sample_ids,
        "segment_rel_offset": segment_rel_offset,
        "segment": segment_ids,
        "num_segments": num_segments,
        "segment_rel": segment_rel,
        "use_cache": True,
        "past_key_values": past_key_values,
        "ext_table_ids": batch_ext_table_ids,
        "ext_table_sub": batch_ext_table_sub,
        "past_states": past_states,
    }, input_ids

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.set_input_embeddings(embeddings)` ¶

Sets the input embeddings for the CpmBeeForCausalLM class.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeForCausalLM class. TYPE: `CpmBeeForCausalLM`
`embeddings`	The input embeddings to be set for the CpmBeeForCausalLM instance.

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def set_input_embeddings(self, embeddings):
    """
    Sets the input embeddings for the CpmBeeForCausalLM class.

    Args:
        self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.
        embeddings: The input embeddings to be set for the CpmBeeForCausalLM instance.

    Returns:
        None.

    Raises:
        None.
    """
    self.cpmbee.input_embedding = embeddings

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.set_output_embeddings(new_embeddings)` ¶

Sets the output embeddings for the CpmBeeForCausalLM model.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeForCausalLM class. TYPE: `CpmBeeForCausalLM`
`new_embeddings`	The new embeddings to be set as the output embeddings. This should be a tensor or an object that can be converted to a tensor.

RETURNS	DESCRIPTION
	None

This method sets the output embeddings of the CpmBeeForCausalLM model to the provided new embeddings. The new embeddings are assigned to the 'lm_head' attribute of the model object.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def set_output_embeddings(self, new_embeddings):
    """
    Sets the output embeddings for the CpmBeeForCausalLM model.

    Args:
        self (CpmBeeForCausalLM): The instance of the CpmBeeForCausalLM class.
        new_embeddings: The new embeddings to be set as the output embeddings.
            This should be a tensor or an object that can be converted to a tensor.

    Returns:
        None

    Raises:
        None

    This method sets the output embeddings of the CpmBeeForCausalLM model to the provided new embeddings.
    The new embeddings are assigned to the 'lm_head' attribute of the model object.
    """
    self.lm_head = new_embeddings

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm` ¶

Bases: Module

We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeLayerNorm(nn.Module):
    """
    We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."
    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes a CpmBeeLayerNorm object with the provided configuration.

        Args:
            self: The instance of the CpmBeeLayerNorm class.
            config (CpmBeeConfig):
                An instance of the CpmBeeConfig class containing the configuration parameters.

                - config.eps (float): The value for epsilon used in normalization.
                - config.hidden_size (int): The dimension of the hidden size.
                - config.ms_dtype (str): The data type for the weight parameter.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.eps = config.eps
        self.dim_norm = config.hidden_size
        self.weight = Parameter(ops.zeros(config.hidden_size, dtype=config.ms_dtype))

    def forward(self, hidden_states: mindspore.Tensor):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        if hidden_states.shape[-1] != self.dim_norm:
            raise AssertionError("hidden_states.shape[-1] != self.dim_norm")
        old_dtype = hidden_states.dtype
        variance = hidden_states.to(mindspore.float32).pow(2).mean(axis=-1, keep_dims=True)
        hidden_states = (hidden_states * ops.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
        return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm.init(config)` ¶

Initializes a CpmBeeLayerNorm object with the provided configuration.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeLayerNorm class.
`config`	An instance of the CpmBeeConfig class containing the configuration parameters. config.eps (float): The value for epsilon used in normalization. config.hidden_size (int): The dimension of the hidden size. config.ms_dtype (str): The data type for the weight parameter. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes a CpmBeeLayerNorm object with the provided configuration.

    Args:
        self: The instance of the CpmBeeLayerNorm class.
        config (CpmBeeConfig):
            An instance of the CpmBeeConfig class containing the configuration parameters.

            - config.eps (float): The value for epsilon used in normalization.
            - config.hidden_size (int): The dimension of the hidden size.
            - config.ms_dtype (str): The data type for the weight parameter.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.eps = config.eps
    self.dim_norm = config.hidden_size
    self.weight = Parameter(ops.zeros(config.hidden_size, dtype=config.ms_dtype))

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm.forward(hidden_states)` ¶

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, hidden_states: mindspore.Tensor):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    if hidden_states.shape[-1] != self.dim_norm:
        raise AssertionError("hidden_states.shape[-1] != self.dim_norm")
    old_dtype = hidden_states.dtype
    variance = hidden_states.to(mindspore.float32).pow(2).mean(axis=-1, keep_dims=True)
    hidden_states = (hidden_states * ops.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
    return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear` ¶

Bases: Linear

This class represents a linear layer with a scale operation for CPMBee. It is a subclass of the nn.Linear class.

ATTRIBUTE	DESCRIPTION
`dim_in`	The input dimension of the linear layer. TYPE: `int`
`dim_out`	The output dimension of the linear layer. TYPE: `int`
`weight`	The weight parameter of the linear layer. TYPE: `Parameter`

METHOD	DESCRIPTION
`__init__`	Construct a linear layer for CPMBee with a scale operation.
`forward`	Apply the linear transformation to the input tensor.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeLinear(nn.Linear):

    """
    This class represents a linear layer with a scale operation for CPMBee. It is a subclass of the nn.Linear class.

    Attributes:
        dim_in (int): The input dimension of the linear layer.
        dim_out (int): The output dimension of the linear layer.
        weight (mindspore.Parameter): The weight parameter of the linear layer.

    Methods:
        __init__(self, dim_in, dim_out, dtype):
            Construct a linear layer for CPMBee with a scale operation.

        forward(self, x):
            Apply the linear transformation to the input tensor.

    """
    def __init__(self, dim_in, dim_out, dtype):
        """
        Construct a linear for CPMBee. It contains a scale operation.
        """
        super().__init__(dim_in, dim_out, bias=False)
        self.dim_in = self.in_features = dim_in
        self.dim_out = self.out_features = dim_out

        self.weight = Parameter(ops.zeros((dim_out, dim_in), dtype=dtype))

    def forward(self, x: mindspore.Tensor):
        """

        Args:
            x (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`): The input of linear layer

        Returns:
            `mindspore.Tensor` of shape `(batch, seq_len, dim_out)`: The output of the linear transform y.
        """
        x = F.linear(x, self.weight)
        x = x / math.sqrt(self.dim_in)
        return x

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear.init(dim_in, dim_out, dtype)` ¶

Construct a linear for CPMBee. It contains a scale operation.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, dim_in, dim_out, dtype):
    """
    Construct a linear for CPMBee. It contains a scale operation.
    """
    super().__init__(dim_in, dim_out, bias=False)
    self.dim_in = self.in_features = dim_in
    self.dim_out = self.out_features = dim_out

    self.weight = Parameter(ops.zeros((dim_out, dim_in), dtype=dtype))

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear.forward(x)` ¶

PARAMETER	DESCRIPTION
`x`	The input of linear layer TYPE: `mindspore.Tensor` of shape `(batch, seq_len, dim_in)`

RETURNS	DESCRIPTION
	`mindspore.Tensor` of shape `(batch, seq_len, dim_out)`: The output of the linear transform y.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, x: mindspore.Tensor):
    """

    Args:
        x (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`): The input of linear layer

    Returns:
        `mindspore.Tensor` of shape `(batch, seq_len, dim_out)`: The output of the linear transform y.
    """
    x = F.linear(x, self.weight)
    x = x / math.sqrt(self.dim_in)
    return x

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel` ¶

Bases: CpmBeePreTrainedModel

CpmBeeModel

This class represents a CpmBee model for natural language processing tasks. It is a subclass of CpmBeePreTrainedModel and inherits all the functionality from it.

ATTRIBUTE	DESCRIPTION
`encoder`	An instance of CpmBeeEncoder, responsible for encoding the input sequences.
`input_embedding`	An instance of CpmBeeEmbeddingExt, used for embedding the input sequences.
`position_bias`	An instance of CpmBeeBucketPositionBias, used for calculating the position bias.
`vocab_size`	An integer representing the size of the vocabulary.

METHOD	DESCRIPTION
`__init__`	Initializes the CpmBeeModel instance with the given configuration.
`get_input_embeddings`	Returns the input embedding instance.
`set_input_embeddings`	Sets the input embeddings to the given value.
`forward`	Constructs the CpmBee model with the provided input and configuration.
`inference`	Performs inference using the CpmBee model with the provided input and configuration.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeModel(CpmBeePreTrainedModel):

    """
    CpmBeeModel

    This class represents a CpmBee model for natural language processing tasks.
    It is a subclass of CpmBeePreTrainedModel and inherits all the functionality from it.

    Attributes:
        encoder: An instance of CpmBeeEncoder, responsible for encoding the input sequences.
        input_embedding: An instance of CpmBeeEmbeddingExt, used for embedding the input sequences.
        position_bias: An instance of CpmBeeBucketPositionBias, used for calculating the position bias.
        vocab_size: An integer representing the size of the vocabulary.

    Methods:
        __init__: Initializes the CpmBeeModel instance with the given configuration.
        get_input_embeddings: Returns the input embedding instance.
        set_input_embeddings: Sets the input embeddings to the given value.
        forward: Constructs the CpmBee model with the provided input and configuration.
        inference: Performs inference using the CpmBee model with the provided input and configuration.
    """
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes an instance of the CpmBeeModel class.

        Args:
            self: The object instance.
            config (CpmBeeConfig):
                The configuration object that contains the model settings.

                - type: CpmBeeConfig
                - purpose: Specifies the model configuration.
                - restrictions: Must be an instance of CpmBeeConfig.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        if config.half:
            config.ms_dtype = mindspore.float16
        else:
            config.ms_dtype = mindspore.float32
        self.encoder = CpmBeeEncoder(config)
        self.input_embedding = CpmBeeEmbeddingExt(config)
        self.position_bias = CpmBeeBucketPositionBias(config)
        self.vocab_size = config.vocab_size
        self.post_init()

    def get_input_embeddings(self):
        """
        This method retrieves the input embeddings for the CpmBeeModel.

        Args:
            self (CpmBeeModel): The instance of the CpmBeeModel class.
                It is used to access the input embeddings for the model.

        Returns:
            input_embedding: The method returns the input embedding associated with the CpmBeeModel instance.

        Raises:
            None.
        """
        return self.input_embedding

    def set_input_embeddings(self, embeddings, **kwargs):
        """
        This method sets the input embeddings for the CpmBeeModel.

        Args:
            embeddings (object): The input embeddings to be set for the model.
                It can be of any type and should contain the necessary information for input embeddings.

        Returns:
            None.

        Raises:
            None.
        """
        self.input_embedding = embeddings

    def forward(
        self,
        input_ids: mindspore.Tensor,
        input_id_sub: Optional[mindspore.Tensor] = None,
        length: Optional[mindspore.Tensor] = None,
        context: Optional[mindspore.Tensor] = None,
        sample_ids: Optional[mindspore.Tensor] = None,
        num_segments: Optional[mindspore.Tensor] = None,
        segment: Optional[mindspore.Tensor] = None,
        segment_rel_offset: Optional[mindspore.Tensor] = None,
        segment_rel: Optional[mindspore.Tensor] = None,
        span: Optional[Dict] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[List] = None,
        use_cache: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        **kwargs,
    ):
        """
        Constructs the CpmBeeModel.

        Args:
            self: The object itself.
            input_ids (mindspore.Tensor): The input tensor of shape (batch, seq_length) containing the input IDs.
            input_id_sub (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the sub input IDs. Defaults to None.
            length (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch,) containing the length of the input sequences.
                Defaults to None.
            context (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the context. Defaults to None.
            sample_ids (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the sample IDs. Defaults to None.
            num_segments (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the number of segments.
                Defaults to None.
            segment (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the segments. Defaults to None.
            segment_rel_offset (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the segment relative offset.
                Defaults to None.
            segment_rel (Optional[mindspore.Tensor], optional):
                The optional input tensor of shape (batch, seq_length) containing the segment relative.
                Defaults to None.
            span (Optional[Dict], optional):
                The optional input dictionary containing span information. Defaults to None.
            output_attentions (Optional[bool], optional):
                The optional boolean flag indicating whether to output attentions. Defaults to None.
            output_hidden_states (Optional[bool], optional):
                The optional boolean flag indicating whether to output hidden states. Defaults to None.
            past_key_values (Optional[List], optional):
                The optional list containing past key values. Defaults to None.
            use_cache (Optional[bool], optional):
                The optional boolean flag indicating whether to use cache. Defaults to None.
            return_dict (Optional[bool], optional):
                The optional boolean flag indicating whether to return a dictionary. Defaults to None.

        Returns:
            None

        Raises:
            None
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        use_cache = use_cache if use_cache is not None else self.config.use_cache

        # dummy setting for common tests
        if input_id_sub is None:
            dtype = input_ids.dtype
            batch, seq_length = input_ids.shape
            segment = ops.where(input_ids != 0, mindspore.tensor(2), 0).to(dtype=dtype)
            context = ops.full((batch, seq_length), 1, dtype=dtype)
            position = ops.tile(ops.arange(seq_length, dtype=dtype), (batch, 1))
            input_id_sub = ops.full((batch, seq_length), 0, dtype=dtype)
            segment_rel_offset = ops.full((batch, seq_length), 0, dtype=dtype)
            segment_rel = ops.full((batch, seq_length), 0, dtype=dtype)
            num_segments = ops.full((batch, seq_length), 0, dtype=dtype)
            sample_ids = ops.zeros_like(input_ids)

        batch = input_ids.shape[0]
        seqlen = input_ids.shape[1]

        # calc segment bucket
        segment_rel_2d = ops.masked_fill(
            segment[:, :, None] * num_segments[:, :, None]
            + segment[:, None, :]
            + segment_rel_offset[:, :, None],
            ~(
                (sample_ids[:, :, None] == sample_ids[:, None, :]).to(mindspore.int32)
                & (span[:, None, :] == span[:, :, None]).to(mindspore.int32)
            ),  # not in the same span or sample
            0,  # avoid torch.gather overflow
        ).view(batch, seqlen * seqlen)

        segment_bucket = ops.gather(
            input=segment_rel,
            dim=1,
            index=segment_rel_2d.long(),
        ).view(batch, seqlen, seqlen)

        segment_bucket = segment_bucket.masked_fill(
            ~(
                (sample_ids[:, :, None] == sample_ids[:, None, :]).to(mindspore.int32)
                & (span[:, None, :] == span[:, :, None]).to(mindspore.int32)
            ),  # not in the same span or sample
            1,  # bucket is used for in-context samples
        )

        # directional mask
        directional_mask_2d = ops.arange(seqlen) <= ops.arange(
            seqlen
        ).view(-1, 1)
        # sample mask
        sample_mask_2d = (sample_ids[:, :, None] == 0).to(mindspore.int32) | (
            sample_ids[:, :, None] == sample_ids[:, None, :]
        ).to(mindspore.int32)
        # context mask
        attention_mask = context[:, None, :] | (
            context[:, :, None].logical_not().to(mindspore.int32) & directional_mask_2d.view(1, seqlen, seqlen).to(mindspore.int32)
        )
        # span mask
        attention_mask = (
            attention_mask & sample_mask_2d & (span[:, None, :] == span[:, :, None])
        )
        # length mask
        mask_1d = (
            ops.tile(ops.arange(seqlen)[None, :], (batch, 1)) < length[:, None]
        ).to(mindspore.int32)
        attention_mask = (
            mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask
        ).to(mindspore.bool_)
        position = ops.broadcast_to(ops.arange(seqlen), (batch, seqlen))

        hidden_states = self.input_embedding(input_ids, input_id_sub)
        position_bias = self.position_bias(position, position, segment_bucket)
        hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
            hidden_states,
            attention_mask,
            position_bias,
            output_attentions,
            output_hidden_states,
            past_key_values=None,
            use_cache=False
        )

        if not return_dict:
            return tuple(
                v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
            )

        return BaseModelOutputWithPast(
            last_hidden_state=hidden_states,
            past_key_values=present_key_values,
            hidden_states=all_hidden_states,
            attentions=all_attentions,
        )

    def inference(
        self,
        input_ids: mindspore.Tensor,
        input_id_sub: Optional[mindspore.Tensor] = None,
        position: Optional[mindspore.Tensor] = None,
        context: Optional[mindspore.Tensor] = None,
        sample_ids: Optional[mindspore.Tensor] = None,
        num_segments: Optional[mindspore.Tensor] = None,
        segment: Optional[mindspore.Tensor] = None,
        segment_rel_offset: Optional[mindspore.Tensor] = None,
        segment_rel: Optional[mindspore.Tensor] = None,
        past_states: Optional[Dict] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[List] = None,
        use_cache: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        **kwargs,
    ):
        '''
        Perform inference using the CpmBeeModel.

        Args:
            self (CpmBeeModel): An instance of the CpmBeeModel class.
            input_ids (mindspore.Tensor):
                The input tensor of shape (batch, seq_length) containing the input IDs.
            input_id_sub (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the sub input IDs. Default is None.
            position (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the position information.
                Default is None.
            context (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the context information.
                Default is None.
            sample_ids (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the sample IDs. Default is None.
            num_segments (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the number of segments.
                Default is None.
            segment (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the segment information.
                Default is None.
            segment_rel_offset (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the segment relative offset.
                Default is None.
            segment_rel (Optional[mindspore.Tensor]):
                The optional input tensor of shape (batch, seq_length) containing the segment relative information.
                Default is None.
            past_states (Optional[Dict]):
                The optional dictionary containing the past states. Default is None.
            output_attentions (Optional[bool]):
                Whether to output attentions. If None, it uses the output_attentions from the model configuration.
                Default is None.
            output_hidden_states (Optional[bool]):
                Whether to output hidden states. If None, it uses the output_hidden_states from the model configuration.
                Default is None.
            past_key_values (Optional[List]): The optional list containing the past key values. Default is None.
            use_cache (Optional[bool]):
                Whether to use cache. If None, it uses the use_cache from the model configuration. Default is None.
            return_dict (Optional[bool]):
                Whether to return a dictionary. If None, it uses the use_return_dict from the model configuration.
                Default is None.

        Returns:
            BaseModelOutputWithPast: An instance of BaseModelOutputWithPast containing the last hidden state,
                past key values, hidden states, and attentions.

        Raises:
            None
        '''
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        use_cache = use_cache if use_cache is not None else self.config.use_cache

        # dummy setting for common tests
        if input_id_sub is None:
            dtype = input_ids.dtype
            batch, seq_length = input_ids.shape
            segment = ops.where(input_ids != 0, 2, 0).to(dtype=dtype)
            context = ops.full((batch, seq_length), 1, dtype=dtype)
            position = ops.arange(seq_length, dtype=dtype).repeat(batch, 1)
            input_id_sub = ops.full((batch, seq_length), 0, dtype=dtype)
            segment_rel_offset = ops.full((batch, seq_length), 0, dtype=dtype)
            segment_rel = ops.full((batch, seq_length), 0, dtype=dtype)
            num_segments = ops.full((batch, seq_length), 0, dtype=dtype)
            sample_ids = ops.zeros_like(input_ids)

        if past_states is None:
            present_position = position
            present_context = context
            present_sample_ids = sample_ids
            present_num_segments = num_segments
            present_segments = segment
            present_buffer = None
        else:
            present_position = ops.cat([past_states["buffer_position"], position], dim=-1)
            present_context = ops.cat([past_states["buffer_context"], context.astype(mindspore.int64)], dim=-1)
            present_sample_ids = ops.cat([past_states["buffer_sample_ids"], sample_ids], dim=-1)
            present_num_segments = ops.cat([past_states["buffer_num_segments"], num_segments], dim=-1)
            present_segments = ops.cat([past_states["buffer_segments"], segment], dim=-1)
            present_buffer = past_states["buffer"]

        batch = input_ids.shape[0]
        len_q = input_ids.shape[1]
        len_buffer = present_position.shape[1]

        segment_rel_2d = ops.masked_fill(
            segment[:, :, None] * num_segments[:, :, None]
            + present_segments[:, None, :]
            + segment_rel_offset[:, :, None],
            ~((sample_ids[:, :, None] == present_sample_ids[:, None, :])),  # not in the same sample
            0,  # avoid torch.gather overflow
        ).view(batch, len_q * len_buffer)

        segment_bucket = ops.gather(
            input=segment_rel,
            dim=1,
            index=segment_rel_2d.long(),
        ).view(batch, len_q, len_buffer)

        segment_bucket = segment_bucket.masked_fill(
            ~((sample_ids[:, :, None] == present_sample_ids[:, None, :])),  # not in the same span or sample
            1,  # bucket is used for in-context samples
        )

        # directional mask
        directional_mask_2d = present_position[:, None, :] <= position[:, :, None]
        # sample mask
        sample_mask_2d = (sample_ids[:, :, None] == 0) | (sample_ids[:, :, None] == present_sample_ids[:, None, :])
        # context mask
        attention_mask = present_context[:, None, :] | (
            context[:, :, None].logical_not() & directional_mask_2d.view(batch, len_q, len_buffer)
        )
        # span mask
        attention_mask = attention_mask & sample_mask_2d
        # length mask
        mask_1d = present_num_segments != 0
        attention_mask = mask_1d.view(batch, 1, len_buffer) & attention_mask

        hidden_states = self.input_embedding(input_ids, input_id_sub)
        position_bias = self.position_bias(position, present_position, segment_bucket)
        hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
            hidden_states,
            attention_mask,
            position_bias,
            output_attentions,
            output_hidden_states,
            present_buffer,
            use_cache,
        )

        if not return_dict:
            return tuple(
                v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
            )

        return BaseModelOutputWithPast(
            last_hidden_state=hidden_states,
            past_key_values=present_key_values,
            hidden_states=all_hidden_states,
            attentions=all_attentions,
        )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.init(config)` ¶

Initializes an instance of the CpmBeeModel class.

PARAMETER	DESCRIPTION
`self`	The object instance.
`config`	The configuration object that contains the model settings. type: CpmBeeConfig purpose: Specifies the model configuration. restrictions: Must be an instance of CpmBeeConfig. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes an instance of the CpmBeeModel class.

    Args:
        self: The object instance.
        config (CpmBeeConfig):
            The configuration object that contains the model settings.

            - type: CpmBeeConfig
            - purpose: Specifies the model configuration.
            - restrictions: Must be an instance of CpmBeeConfig.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    if config.half:
        config.ms_dtype = mindspore.float16
    else:
        config.ms_dtype = mindspore.float32
    self.encoder = CpmBeeEncoder(config)
    self.input_embedding = CpmBeeEmbeddingExt(config)
    self.position_bias = CpmBeeBucketPositionBias(config)
    self.vocab_size = config.vocab_size
    self.post_init()

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.forward(input_ids, input_id_sub=None, length=None, context=None, sample_ids=None, num_segments=None, segment=None, segment_rel_offset=None, segment_rel=None, span=None, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None, return_dict=None, **kwargs)` ¶

Constructs the CpmBeeModel.

PARAMETER	DESCRIPTION
`self`	The object itself.
`input_ids`	The input tensor of shape (batch, seq_length) containing the input IDs. TYPE: `Tensor`
`input_id_sub`	The optional input tensor of shape (batch, seq_length) containing the sub input IDs. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`length`	The optional input tensor of shape (batch,) containing the length of the input sequences. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`context`	The optional input tensor of shape (batch, seq_length) containing the context. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`sample_ids`	The optional input tensor of shape (batch, seq_length) containing the sample IDs. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`num_segments`	The optional input tensor of shape (batch, seq_length) containing the number of segments. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`segment`	The optional input tensor of shape (batch, seq_length) containing the segments. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`segment_rel_offset`	The optional input tensor of shape (batch, seq_length) containing the segment relative offset. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`segment_rel`	The optional input tensor of shape (batch, seq_length) containing the segment relative. Defaults to None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`span`	The optional input dictionary containing span information. Defaults to None. TYPE: `Optional[Dict]` DEFAULT: `None`
`output_attentions`	The optional boolean flag indicating whether to output attentions. Defaults to None. TYPE: `Optional[bool]` DEFAULT: `None`
`output_hidden_states`	The optional boolean flag indicating whether to output hidden states. Defaults to None. TYPE: `Optional[bool]` DEFAULT: `None`
`past_key_values`	The optional list containing past key values. Defaults to None. TYPE: `Optional[List]` DEFAULT: `None`
`use_cache`	The optional boolean flag indicating whether to use cache. Defaults to None. TYPE: `Optional[bool]` DEFAULT: `None`
`return_dict`	The optional boolean flag indicating whether to return a dictionary. Defaults to None. TYPE: `Optional[bool]` DEFAULT: `None`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    input_ids: mindspore.Tensor,
    input_id_sub: Optional[mindspore.Tensor] = None,
    length: Optional[mindspore.Tensor] = None,
    context: Optional[mindspore.Tensor] = None,
    sample_ids: Optional[mindspore.Tensor] = None,
    num_segments: Optional[mindspore.Tensor] = None,
    segment: Optional[mindspore.Tensor] = None,
    segment_rel_offset: Optional[mindspore.Tensor] = None,
    segment_rel: Optional[mindspore.Tensor] = None,
    span: Optional[Dict] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[List] = None,
    use_cache: Optional[bool] = None,
    return_dict: Optional[bool] = None,
    **kwargs,
):
    """
    Constructs the CpmBeeModel.

    Args:
        self: The object itself.
        input_ids (mindspore.Tensor): The input tensor of shape (batch, seq_length) containing the input IDs.
        input_id_sub (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the sub input IDs. Defaults to None.
        length (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch,) containing the length of the input sequences.
            Defaults to None.
        context (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the context. Defaults to None.
        sample_ids (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the sample IDs. Defaults to None.
        num_segments (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the number of segments.
            Defaults to None.
        segment (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the segments. Defaults to None.
        segment_rel_offset (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the segment relative offset.
            Defaults to None.
        segment_rel (Optional[mindspore.Tensor], optional):
            The optional input tensor of shape (batch, seq_length) containing the segment relative.
            Defaults to None.
        span (Optional[Dict], optional):
            The optional input dictionary containing span information. Defaults to None.
        output_attentions (Optional[bool], optional):
            The optional boolean flag indicating whether to output attentions. Defaults to None.
        output_hidden_states (Optional[bool], optional):
            The optional boolean flag indicating whether to output hidden states. Defaults to None.
        past_key_values (Optional[List], optional):
            The optional list containing past key values. Defaults to None.
        use_cache (Optional[bool], optional):
            The optional boolean flag indicating whether to use cache. Defaults to None.
        return_dict (Optional[bool], optional):
            The optional boolean flag indicating whether to return a dictionary. Defaults to None.

    Returns:
        None

    Raises:
        None
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    use_cache = use_cache if use_cache is not None else self.config.use_cache

    # dummy setting for common tests
    if input_id_sub is None:
        dtype = input_ids.dtype
        batch, seq_length = input_ids.shape
        segment = ops.where(input_ids != 0, mindspore.tensor(2), 0).to(dtype=dtype)
        context = ops.full((batch, seq_length), 1, dtype=dtype)
        position = ops.tile(ops.arange(seq_length, dtype=dtype), (batch, 1))
        input_id_sub = ops.full((batch, seq_length), 0, dtype=dtype)
        segment_rel_offset = ops.full((batch, seq_length), 0, dtype=dtype)
        segment_rel = ops.full((batch, seq_length), 0, dtype=dtype)
        num_segments = ops.full((batch, seq_length), 0, dtype=dtype)
        sample_ids = ops.zeros_like(input_ids)

    batch = input_ids.shape[0]
    seqlen = input_ids.shape[1]

    # calc segment bucket
    segment_rel_2d = ops.masked_fill(
        segment[:, :, None] * num_segments[:, :, None]
        + segment[:, None, :]
        + segment_rel_offset[:, :, None],
        ~(
            (sample_ids[:, :, None] == sample_ids[:, None, :]).to(mindspore.int32)
            & (span[:, None, :] == span[:, :, None]).to(mindspore.int32)
        ),  # not in the same span or sample
        0,  # avoid torch.gather overflow
    ).view(batch, seqlen * seqlen)

    segment_bucket = ops.gather(
        input=segment_rel,
        dim=1,
        index=segment_rel_2d.long(),
    ).view(batch, seqlen, seqlen)

    segment_bucket = segment_bucket.masked_fill(
        ~(
            (sample_ids[:, :, None] == sample_ids[:, None, :]).to(mindspore.int32)
            & (span[:, None, :] == span[:, :, None]).to(mindspore.int32)
        ),  # not in the same span or sample
        1,  # bucket is used for in-context samples
    )

    # directional mask
    directional_mask_2d = ops.arange(seqlen) <= ops.arange(
        seqlen
    ).view(-1, 1)
    # sample mask
    sample_mask_2d = (sample_ids[:, :, None] == 0).to(mindspore.int32) | (
        sample_ids[:, :, None] == sample_ids[:, None, :]
    ).to(mindspore.int32)
    # context mask
    attention_mask = context[:, None, :] | (
        context[:, :, None].logical_not().to(mindspore.int32) & directional_mask_2d.view(1, seqlen, seqlen).to(mindspore.int32)
    )
    # span mask
    attention_mask = (
        attention_mask & sample_mask_2d & (span[:, None, :] == span[:, :, None])
    )
    # length mask
    mask_1d = (
        ops.tile(ops.arange(seqlen)[None, :], (batch, 1)) < length[:, None]
    ).to(mindspore.int32)
    attention_mask = (
        mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask
    ).to(mindspore.bool_)
    position = ops.broadcast_to(ops.arange(seqlen), (batch, seqlen))

    hidden_states = self.input_embedding(input_ids, input_id_sub)
    position_bias = self.position_bias(position, position, segment_bucket)
    hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
        hidden_states,
        attention_mask,
        position_bias,
        output_attentions,
        output_hidden_states,
        past_key_values=None,
        use_cache=False
    )

    if not return_dict:
        return tuple(
            v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
        )

    return BaseModelOutputWithPast(
        last_hidden_state=hidden_states,
        past_key_values=present_key_values,
        hidden_states=all_hidden_states,
        attentions=all_attentions,
    )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.get_input_embeddings()` ¶

This method retrieves the input embeddings for the CpmBeeModel.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeModel class. It is used to access the input embeddings for the model. TYPE: `CpmBeeModel`

RETURNS	DESCRIPTION
`input_embedding`	The method returns the input embedding associated with the CpmBeeModel instance.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def get_input_embeddings(self):
    """
    This method retrieves the input embeddings for the CpmBeeModel.

    Args:
        self (CpmBeeModel): The instance of the CpmBeeModel class.
            It is used to access the input embeddings for the model.

    Returns:
        input_embedding: The method returns the input embedding associated with the CpmBeeModel instance.

    Raises:
        None.
    """
    return self.input_embedding

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.inference(input_ids, input_id_sub=None, position=None, context=None, sample_ids=None, num_segments=None, segment=None, segment_rel_offset=None, segment_rel=None, past_states=None, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None, return_dict=None, **kwargs)` ¶

Perform inference using the CpmBeeModel.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeModel class. TYPE: `CpmBeeModel`
`input_ids`	The input tensor of shape (batch, seq_length) containing the input IDs. TYPE: `Tensor`
`input_id_sub`	The optional input tensor of shape (batch, seq_length) containing the sub input IDs. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`position`	The optional input tensor of shape (batch, seq_length) containing the position information. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`context`	The optional input tensor of shape (batch, seq_length) containing the context information. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`sample_ids`	The optional input tensor of shape (batch, seq_length) containing the sample IDs. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`num_segments`	The optional input tensor of shape (batch, seq_length) containing the number of segments. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`segment`	The optional input tensor of shape (batch, seq_length) containing the segment information. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`segment_rel_offset`	The optional input tensor of shape (batch, seq_length) containing the segment relative offset. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`segment_rel`	The optional input tensor of shape (batch, seq_length) containing the segment relative information. Default is None. TYPE: `Optional[Tensor]` DEFAULT: `None`
`past_states`	The optional dictionary containing the past states. Default is None. TYPE: `Optional[Dict]` DEFAULT: `None`
`output_attentions`	Whether to output attentions. If None, it uses the output_attentions from the model configuration. Default is None. TYPE: `Optional[bool]` DEFAULT: `None`
`output_hidden_states`	Whether to output hidden states. If None, it uses the output_hidden_states from the model configuration. Default is None. TYPE: `Optional[bool]` DEFAULT: `None`
`past_key_values`	The optional list containing the past key values. Default is None. TYPE: `Optional[List]` DEFAULT: `None`
`use_cache`	Whether to use cache. If None, it uses the use_cache from the model configuration. Default is None. TYPE: `Optional[bool]` DEFAULT: `None`
`return_dict`	Whether to return a dictionary. If None, it uses the use_return_dict from the model configuration. Default is None. TYPE: `Optional[bool]` DEFAULT: `None`

RETURNS	DESCRIPTION
`BaseModelOutputWithPast`	An instance of BaseModelOutputWithPast containing the last hidden state, past key values, hidden states, and attentions.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def inference(
    self,
    input_ids: mindspore.Tensor,
    input_id_sub: Optional[mindspore.Tensor] = None,
    position: Optional[mindspore.Tensor] = None,
    context: Optional[mindspore.Tensor] = None,
    sample_ids: Optional[mindspore.Tensor] = None,
    num_segments: Optional[mindspore.Tensor] = None,
    segment: Optional[mindspore.Tensor] = None,
    segment_rel_offset: Optional[mindspore.Tensor] = None,
    segment_rel: Optional[mindspore.Tensor] = None,
    past_states: Optional[Dict] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[List] = None,
    use_cache: Optional[bool] = None,
    return_dict: Optional[bool] = None,
    **kwargs,
):
    '''
    Perform inference using the CpmBeeModel.

    Args:
        self (CpmBeeModel): An instance of the CpmBeeModel class.
        input_ids (mindspore.Tensor):
            The input tensor of shape (batch, seq_length) containing the input IDs.
        input_id_sub (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the sub input IDs. Default is None.
        position (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the position information.
            Default is None.
        context (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the context information.
            Default is None.
        sample_ids (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the sample IDs. Default is None.
        num_segments (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the number of segments.
            Default is None.
        segment (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the segment information.
            Default is None.
        segment_rel_offset (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the segment relative offset.
            Default is None.
        segment_rel (Optional[mindspore.Tensor]):
            The optional input tensor of shape (batch, seq_length) containing the segment relative information.
            Default is None.
        past_states (Optional[Dict]):
            The optional dictionary containing the past states. Default is None.
        output_attentions (Optional[bool]):
            Whether to output attentions. If None, it uses the output_attentions from the model configuration.
            Default is None.
        output_hidden_states (Optional[bool]):
            Whether to output hidden states. If None, it uses the output_hidden_states from the model configuration.
            Default is None.
        past_key_values (Optional[List]): The optional list containing the past key values. Default is None.
        use_cache (Optional[bool]):
            Whether to use cache. If None, it uses the use_cache from the model configuration. Default is None.
        return_dict (Optional[bool]):
            Whether to return a dictionary. If None, it uses the use_return_dict from the model configuration.
            Default is None.

    Returns:
        BaseModelOutputWithPast: An instance of BaseModelOutputWithPast containing the last hidden state,
            past key values, hidden states, and attentions.

    Raises:
        None
    '''
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    use_cache = use_cache if use_cache is not None else self.config.use_cache

    # dummy setting for common tests
    if input_id_sub is None:
        dtype = input_ids.dtype
        batch, seq_length = input_ids.shape
        segment = ops.where(input_ids != 0, 2, 0).to(dtype=dtype)
        context = ops.full((batch, seq_length), 1, dtype=dtype)
        position = ops.arange(seq_length, dtype=dtype).repeat(batch, 1)
        input_id_sub = ops.full((batch, seq_length), 0, dtype=dtype)
        segment_rel_offset = ops.full((batch, seq_length), 0, dtype=dtype)
        segment_rel = ops.full((batch, seq_length), 0, dtype=dtype)
        num_segments = ops.full((batch, seq_length), 0, dtype=dtype)
        sample_ids = ops.zeros_like(input_ids)

    if past_states is None:
        present_position = position
        present_context = context
        present_sample_ids = sample_ids
        present_num_segments = num_segments
        present_segments = segment
        present_buffer = None
    else:
        present_position = ops.cat([past_states["buffer_position"], position], dim=-1)
        present_context = ops.cat([past_states["buffer_context"], context.astype(mindspore.int64)], dim=-1)
        present_sample_ids = ops.cat([past_states["buffer_sample_ids"], sample_ids], dim=-1)
        present_num_segments = ops.cat([past_states["buffer_num_segments"], num_segments], dim=-1)
        present_segments = ops.cat([past_states["buffer_segments"], segment], dim=-1)
        present_buffer = past_states["buffer"]

    batch = input_ids.shape[0]
    len_q = input_ids.shape[1]
    len_buffer = present_position.shape[1]

    segment_rel_2d = ops.masked_fill(
        segment[:, :, None] * num_segments[:, :, None]
        + present_segments[:, None, :]
        + segment_rel_offset[:, :, None],
        ~((sample_ids[:, :, None] == present_sample_ids[:, None, :])),  # not in the same sample
        0,  # avoid torch.gather overflow
    ).view(batch, len_q * len_buffer)

    segment_bucket = ops.gather(
        input=segment_rel,
        dim=1,
        index=segment_rel_2d.long(),
    ).view(batch, len_q, len_buffer)

    segment_bucket = segment_bucket.masked_fill(
        ~((sample_ids[:, :, None] == present_sample_ids[:, None, :])),  # not in the same span or sample
        1,  # bucket is used for in-context samples
    )

    # directional mask
    directional_mask_2d = present_position[:, None, :] <= position[:, :, None]
    # sample mask
    sample_mask_2d = (sample_ids[:, :, None] == 0) | (sample_ids[:, :, None] == present_sample_ids[:, None, :])
    # context mask
    attention_mask = present_context[:, None, :] | (
        context[:, :, None].logical_not() & directional_mask_2d.view(batch, len_q, len_buffer)
    )
    # span mask
    attention_mask = attention_mask & sample_mask_2d
    # length mask
    mask_1d = present_num_segments != 0
    attention_mask = mask_1d.view(batch, 1, len_buffer) & attention_mask

    hidden_states = self.input_embedding(input_ids, input_id_sub)
    position_bias = self.position_bias(position, present_position, segment_bucket)
    hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
        hidden_states,
        attention_mask,
        position_bias,
        output_attentions,
        output_hidden_states,
        present_buffer,
        use_cache,
    )

    if not return_dict:
        return tuple(
            v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
        )

    return BaseModelOutputWithPast(
        last_hidden_state=hidden_states,
        past_key_values=present_key_values,
        hidden_states=all_hidden_states,
        attentions=all_attentions,
    )

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.set_input_embeddings(embeddings, **kwargs)` ¶

This method sets the input embeddings for the CpmBeeModel.

PARAMETER	DESCRIPTION
`embeddings`	The input embeddings to be set for the model. It can be of any type and should contain the necessary information for input embeddings. TYPE: `object`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def set_input_embeddings(self, embeddings, **kwargs):
    """
    This method sets the input embeddings for the CpmBeeModel.

    Args:
        embeddings (object): The input embeddings to be set for the model.
            It can be of any type and should contain the necessary information for input embeddings.

    Returns:
        None.

    Raises:
        None.
    """
    self.input_embedding = embeddings

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput` ¶

Bases: Module

CpmBeeOutput represents a neural network cell for processing hidden states, including dense transformation, dropout, and layer normalization.

This class inherits from nn.Module and provides methods for initializing the cell and forwarding the output based on the given input tensors.

ATTRIBUTE	DESCRIPTION
`dense`	A dense layer for transforming the input hidden states. TYPE: `Linear`
`LayerNorm`	A layer normalization module for normalizing the hidden states. TYPE: `LayerNorm`
`dropout`	A dropout module for applying dropout to the hidden states. TYPE: `Dropout`

METHOD	DESCRIPTION
`__init__`	Initializes the CpmBeeOutput cell with the given configuration.
`forward`	Constructs the output based on the input hidden states and input tensor.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeOutput(nn.Module):

    """
    CpmBeeOutput represents a neural network cell for processing hidden states, including dense transformation, dropout, and layer normalization.

    This class inherits from nn.Module and provides methods for initializing the cell and forwarding the output based on the given input tensors.

    Attributes:
        dense (nn.Linear): A dense layer for transforming the input hidden states.
        LayerNorm (nn.LayerNorm): A layer normalization module for normalizing the hidden states.
        dropout (nn.Dropout): A dropout module for applying dropout to the hidden states.

    Methods:
        __init__: Initializes the CpmBeeOutput cell with the given configuration.
        forward: Constructs the output based on the input hidden states and input tensor.

    """
    def __init__(self, config):
        """
        Initializes a CpmBeeOutput instance.

        Args:
            self (CpmBeeOutput): The instance of the CpmBeeOutput class.
            config (object):
                The configuration object containing parameters for the model.

                - intermediate_size (int): The size of the intermediate layer.
                - hidden_size (int): The size of the hidden layer.
                - layer_norm_eps (float): The epsilon value for LayerNorm.
                - hidden_dropout_prob (float): The dropout probability for the hidden layer.

        Returns:
            None.

        Raises:
            TypeError: If the provided config object is not of the expected type.
            ValueError: If the config object is missing any required parameters.
            AttributeError: If there is an issue with accessing the attributes of the config object.
        """
        super().__init__()
        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

    def forward(self, hidden_states: mindspore.Tensor, input_tensor: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the CpmBeeOutput.

        This method takes three parameters: self, hidden_states, and input_tensor. It returns a mindspore.Tensor object.

        Args:
            self (CpmBeeOutput): An instance of the CpmBeeOutput class.
            hidden_states (mindspore.Tensor): The hidden states tensor.
                This tensor contains the hidden states from the previous layer.
            input_tensor (mindspore.Tensor): The input tensor.
                This tensor represents the input to the current layer.

        Returns:
            mindspore.Tensor: The forwarded tensor.
                This tensor is the result of applying the CpmBeeOutput layer operations.

        Raises:
            None.
        """
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput.init(config)` ¶

Initializes a CpmBeeOutput instance.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeOutput class. TYPE: `CpmBeeOutput`
`config`	The configuration object containing parameters for the model. intermediate_size (int): The size of the intermediate layer. hidden_size (int): The size of the hidden layer. layer_norm_eps (float): The epsilon value for LayerNorm. hidden_dropout_prob (float): The dropout probability for the hidden layer. TYPE: `object`

RETURNS	DESCRIPTION
	None.

RAISES	DESCRIPTION
`TypeError`	If the provided config object is not of the expected type.
`ValueError`	If the config object is missing any required parameters.
`AttributeError`	If there is an issue with accessing the attributes of the config object.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config):
    """
    Initializes a CpmBeeOutput instance.

    Args:
        self (CpmBeeOutput): The instance of the CpmBeeOutput class.
        config (object):
            The configuration object containing parameters for the model.

            - intermediate_size (int): The size of the intermediate layer.
            - hidden_size (int): The size of the hidden layer.
            - layer_norm_eps (float): The epsilon value for LayerNorm.
            - hidden_dropout_prob (float): The dropout probability for the hidden layer.

    Returns:
        None.

    Raises:
        TypeError: If the provided config object is not of the expected type.
        ValueError: If the config object is missing any required parameters.
        AttributeError: If there is an issue with accessing the attributes of the config object.
    """
    super().__init__()
    self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
    self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput.forward(hidden_states, input_tensor)` ¶

Constructs the CpmBeeOutput.

This method takes three parameters: self, hidden_states, and input_tensor. It returns a mindspore.Tensor object.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeOutput class. TYPE: `CpmBeeOutput`
`hidden_states`	The hidden states tensor. This tensor contains the hidden states from the previous layer. TYPE: `Tensor`
`input_tensor`	The input tensor. This tensor represents the input to the current layer. TYPE: `Tensor`

RETURNS	DESCRIPTION
`Tensor`	mindspore.Tensor: The forwarded tensor. This tensor is the result of applying the CpmBeeOutput layer operations.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, hidden_states: mindspore.Tensor, input_tensor: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the CpmBeeOutput.

    This method takes three parameters: self, hidden_states, and input_tensor. It returns a mindspore.Tensor object.

    Args:
        self (CpmBeeOutput): An instance of the CpmBeeOutput class.
        hidden_states (mindspore.Tensor): The hidden states tensor.
            This tensor contains the hidden states from the previous layer.
        input_tensor (mindspore.Tensor): The input tensor.
            This tensor represents the input to the current layer.

    Returns:
        mindspore.Tensor: The forwarded tensor.
            This tensor is the result of applying the CpmBeeOutput layer operations.

    Raises:
        None.
    """
    hidden_states = self.dense(hidden_states)
    hidden_states = self.dropout(hidden_states)
    hidden_states = self.LayerNorm(hidden_states + input_tensor)
    return hidden_states

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeePreTrainedModel` ¶

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeePreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = CpmBeeConfig
    base_model_prefix = "cpmbee"

    def _init_weights(self, cell):
        """Initialize the weights"""
        std = self.config.init_std
        if isinstance(cell, nn.Linear):
            cell.weight.set_data(initializer(Normal(std), cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        # still needed
        elif isinstance(cell, CpmBeeEmbeddingExt):
            cell.weight.set_data(initializer(Normal(std), cell.weight.shape, cell.weight.dtype))
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, CpmBeeLayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
        elif isinstance(cell, CpmBeeBucketPositionBias):
            cell.relative_attention_bias.set_data(initializer(
                Normal(std), cell.relative_attention_bias.shape, cell.relative_attention_bias.dtype))

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding` ¶

Bases: Module

RotaryEmbedding embeds the unk token and special token. It will embeds the "..............." to "..............."" to help model to specify different special tokens and unk tokens.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeRotaryEmbedding(nn.Module):
    """
    RotaryEmbedding embeds the unk token and special token. It will embeds the "...<mask>...<mask>...<unk>...<unk>..."
    to "...<mask_0>...<mask_1>...<unk_0>...<unk_1>..."" to help model to specify different special tokens and unk
    tokens.
    """
    def __init__(self, config: CpmBeeConfig):
        '''
        Initializes a new instance of the CpmBeeRotaryEmbedding class.

        Args:
            self: The instance of the CpmBeeRotaryEmbedding class.
            config (CpmBeeConfig):
                An instance of the CpmBeeConfig class containing configuration parameters.

                - Purpose: Represents the configuration for the rotary embedding.
                - Restrictions: Must be a valid instance of the CpmBeeConfig class.

        Returns:
            None.

        Raises:
            None
        '''
        super().__init__()
        inv_freq = 1.0 / (10000 ** (ops.arange(0, config.hidden_size, 2, dtype=mindspore.float32) / config.hidden_size))
        self.distance_scale = config.distance_scale
        self.dtype = config.ms_dtype
        self.inv_freq = inv_freq.to(config.ms_dtype)

    def forward(self, x: mindspore.Tensor, x_pos: mindspore.Tensor):
        """
        Constructs a rotary embedding for a given input tensor.

        Args:
            self (CpmBeeRotaryEmbedding): An instance of the CpmBeeRotaryEmbedding class.
            x (mindspore.Tensor): The input tensor for which the rotary embedding is forwarded.
            x_pos (mindspore.Tensor): The positional encoding tensor.

        Returns:
            None

        Raises:
            None
        """
        inv_freq = self.inv_freq.to(dtype=x.dtype)

        x_pos = x_pos * self.distance_scale
        freqs = x_pos[..., None] * inv_freq[None, :]  # (..., dim/2)

        emb = ops.cat((freqs, freqs), dim=-1)  # (..., dim)
        emb_cos = emb.cos()  # (..., dim)
        emb_sin = emb.sin()  # (..., dim)

        rotate_x = ops.cat([-x[..., x.shape[-1] // 2 :], x[..., : x.shape[-1] // 2]], dim=-1)  # (..., dim)

        return x * emb_cos + rotate_x * emb_sin

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding.init(config)` ¶

Initializes a new instance of the CpmBeeRotaryEmbedding class.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeRotaryEmbedding class.
`config`	An instance of the CpmBeeConfig class containing configuration parameters. Purpose: Represents the configuration for the rotary embedding. Restrictions: Must be a valid instance of the CpmBeeConfig class. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    '''
    Initializes a new instance of the CpmBeeRotaryEmbedding class.

    Args:
        self: The instance of the CpmBeeRotaryEmbedding class.
        config (CpmBeeConfig):
            An instance of the CpmBeeConfig class containing configuration parameters.

            - Purpose: Represents the configuration for the rotary embedding.
            - Restrictions: Must be a valid instance of the CpmBeeConfig class.

    Returns:
        None.

    Raises:
        None
    '''
    super().__init__()
    inv_freq = 1.0 / (10000 ** (ops.arange(0, config.hidden_size, 2, dtype=mindspore.float32) / config.hidden_size))
    self.distance_scale = config.distance_scale
    self.dtype = config.ms_dtype
    self.inv_freq = inv_freq.to(config.ms_dtype)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding.forward(x, x_pos)` ¶

Constructs a rotary embedding for a given input tensor.

PARAMETER	DESCRIPTION
`self`	An instance of the CpmBeeRotaryEmbedding class. TYPE: `CpmBeeRotaryEmbedding`
`x`	The input tensor for which the rotary embedding is forwarded. TYPE: `Tensor`
`x_pos`	The positional encoding tensor. TYPE: `Tensor`

RETURNS	DESCRIPTION
	None

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(self, x: mindspore.Tensor, x_pos: mindspore.Tensor):
    """
    Constructs a rotary embedding for a given input tensor.

    Args:
        self (CpmBeeRotaryEmbedding): An instance of the CpmBeeRotaryEmbedding class.
        x (mindspore.Tensor): The input tensor for which the rotary embedding is forwarded.
        x_pos (mindspore.Tensor): The positional encoding tensor.

    Returns:
        None

    Raises:
        None
    """
    inv_freq = self.inv_freq.to(dtype=x.dtype)

    x_pos = x_pos * self.distance_scale
    freqs = x_pos[..., None] * inv_freq[None, :]  # (..., dim/2)

    emb = ops.cat((freqs, freqs), dim=-1)  # (..., dim)
    emb_cos = emb.cos()  # (..., dim)
    emb_sin = emb.sin()  # (..., dim)

    rotate_x = ops.cat([-x[..., x.shape[-1] // 2 :], x[..., : x.shape[-1] // 2]], dim=-1)  # (..., dim)

    return x * emb_cos + rotate_x * emb_sin

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock` ¶

Bases: Module

Represents a self-attention block in the CpmBee model for transformer-based neural network architectures. This class inherits from nn.Module.

PARAMETER	DESCRIPTION
`config`	The configuration for the self-attention block. TYPE: `CpmBeeConfig`

RAISES	DESCRIPTION
`ValueError`	If the configuration is invalid.

ATTRIBUTE	DESCRIPTION
`layernorm_before_attention`	The layer normalization module before the self-attention block. TYPE: `CpmBeeLayerNorm`
`self_attention`	The self-attention module. TYPE: `CpmBeeAttention`
`dropout`	The dropout layer, if configured. TYPE: `Dropout or None`

RAISES	DESCRIPTION
`ValueError`	If the input tensors are of invalid shape or type.

RETURNS	DESCRIPTION
	Tuple[mindspore.Tensor, mindspore.Tensor, mindspore.Tensor]: The updated hidden states, attention weights, and current key-value states.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeSelfAttentionBlock(nn.Module):

    '''
    Represents a self-attention block in the CpmBee model for transformer-based neural network architectures.
    This class inherits from `nn.Module`.

    Args:
        config (CpmBeeConfig): The configuration for the self-attention block.

    Raises:
        ValueError: If the configuration is invalid.

    Attributes:
        layernorm_before_attention (CpmBeeLayerNorm): The layer normalization module before the self-attention block.
        self_attention (CpmBeeAttention): The self-attention module.
        dropout (nn.Dropout or None): The dropout layer, if configured.

    Raises:
        ValueError: If the input tensors are of invalid shape or type.

    Returns:
        Tuple[mindspore.Tensor, mindspore.Tensor, mindspore.Tensor]:
            The updated hidden states, attention weights, and current key-value states.
    '''
    def __init__(self, config: CpmBeeConfig):
        """
        Initializes a CpmBeeSelfAttentionBlock instance.

        Args:
            self: The CpmBeeSelfAttentionBlock instance itself.
            config (CpmBeeConfig): An instance of CpmBeeConfig containing configuration parameters for the
                self-attention block.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.layernorm_before_attention = CpmBeeLayerNorm(config)
        self.self_attention = CpmBeeAttention(config)
        if config.dropout_p:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
                Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
            attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Avoid invalid areas to participate in the calculation of self-attention.
            position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Provide positional information to self-attention block.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple(mindspore.Tensor)`, *optional*):
                Cached past key and value projection states.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        outputs = self.layernorm_before_attention(hidden_states)
        outputs = self.self_attention(
            outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
        )

        outputs, attn_weights, current_key_value = outputs

        if self.dropout is not None:
            outputs = self.dropout(outputs)
        hidden_states = (hidden_states + outputs) / 1.05

        return hidden_states, attn_weights, current_key_value

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock.init(config)` ¶

Initializes a CpmBeeSelfAttentionBlock instance.

PARAMETER	DESCRIPTION
`self`	The CpmBeeSelfAttentionBlock instance itself.
`config`	An instance of CpmBeeConfig containing configuration parameters for the self-attention block. TYPE: `CpmBeeConfig`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig):
    """
    Initializes a CpmBeeSelfAttentionBlock instance.

    Args:
        self: The CpmBeeSelfAttentionBlock instance itself.
        config (CpmBeeConfig): An instance of CpmBeeConfig containing configuration parameters for the
            self-attention block.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.layernorm_before_attention = CpmBeeLayerNorm(config)
    self.self_attention = CpmBeeAttention(config)
    if config.dropout_p:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)` ¶

PARAMETER	DESCRIPTION
`hidden_states`	Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences. TYPE: `mindspore.Tensor` of shape `(batch, len_seq, dim_model)`
`attention_mask`	Avoid invalid areas to participate in the calculation of self-attention. TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`
`position_bias`	Provide positional information to self-attention block. TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)` DEFAULT: `None`
`output_attentions`	Whether or not to return the attentions tensors of all attention layers. TYPE: `bool`, optional DEFAULT: `False`
`past_key_values`	Cached past key and value projection states. TYPE: `Tuple(mindspore.Tensor)`, optional DEFAULT: `None`
`use_cache`	If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). TYPE: `bool`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
            Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
        attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Avoid invalid areas to participate in the calculation of self-attention.
        position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Provide positional information to self-attention block.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple(mindspore.Tensor)`, *optional*):
            Cached past key and value projection states.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    outputs = self.layernorm_before_attention(hidden_states)
    outputs = self.self_attention(
        outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
    )

    outputs, attn_weights, current_key_value = outputs

    if self.dropout is not None:
        outputs = self.dropout(outputs)
    hidden_states = (hidden_states + outputs) / 1.05

    return hidden_states, attn_weights, current_key_value

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeTransformerBlock` ¶

Bases: Module

This class represents a transformer block of the CPM-BEE model, which is a neural network architecture used for natural language processing tasks. The CpmBeeTransformerBlock class inherits from nn.Module and contains two sub-blocks: a self-attention block and a feed-forward neural network (FFN) block.

ATTRIBUTE	DESCRIPTION
`config`	The configuration object for the CPM-BEE model. TYPE: `CpmBeeConfig`
`mask_att`	A boolean flag indicating whether to apply masking to the self-attention block. TYPE: `bool`
`mask_ffn`	A boolean flag indicating whether to apply masking to the feed-forward neural network block. TYPE: `bool`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

class CpmBeeTransformerBlock(nn.Module):

    """
    This class represents a transformer block of the CPM-BEE model, which is a neural network architecture used for
    natural language processing tasks. The CpmBeeTransformerBlock class inherits from nn.Module and contains
    two sub-blocks: a self-attention block and a feed-forward neural network (FFN) block.

    Attributes:
        config (CpmBeeConfig): The configuration object for the CPM-BEE model.
        mask_att (bool): A boolean flag indicating whether to apply masking to the self-attention block.
        mask_ffn (bool): A boolean flag indicating whether to apply masking to the feed-forward neural network block.
    """
    def __init__(self, config: CpmBeeConfig, mask_att: bool = False, mask_ffn: bool = False):
        """
        __init__

        Initializes a CpmBeeTransformerBlock instance.

        Args:
            self: The instance of the CpmBeeTransformerBlock class.
            config (CpmBeeConfig): An instance of the CpmBeeConfig class containing configuration parameters.
            mask_att (bool, optional): A boolean indicating whether to mask attention. Defaults to False.
            mask_ffn (bool, optional): A boolean indicating whether to mask feed-forward network. Defaults to False.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.mask_att = mask_att
        self.mask_ffn = mask_ffn

        if not self.mask_att:
            self.self_att = CpmBeeSelfAttentionBlock(config)
        if not self.mask_ffn:
            self.ffn = CpmBeeFFNBlock(config)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor`):
                Input to the layer of shape `(batch, seq_len, dim_model)`
            attention_mask (`mindspore.Tensor`):
                Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
            position_bias (`mindspore.Tensor`):
                Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
                Cached past key and value projection states
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        if not self.mask_att:
            hidden_states = self.self_att(
                hidden_states,
                attention_mask=attention_mask,
                position_bias=position_bias,
                output_attentions=output_attentions,
                past_key_values=past_key_values,
                use_cache=use_cache,
            )

            hidden_states, attn_weights, current_key_value = hidden_states
        else:
            attn_weights, current_key_value = None, (None, None)

        if not self.mask_ffn:
            hidden_states = self.ffn(hidden_states)

        return hidden_states, attn_weights, current_key_value

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeTransformerBlock.init(config, mask_att=False, mask_ffn=False)` ¶

init

Initializes a CpmBeeTransformerBlock instance.

PARAMETER	DESCRIPTION
`self`	The instance of the CpmBeeTransformerBlock class.
`config`	An instance of the CpmBeeConfig class containing configuration parameters. TYPE: `CpmBeeConfig`
`mask_att`	A boolean indicating whether to mask attention. Defaults to False. TYPE: `bool` DEFAULT: `False`
`mask_ffn`	A boolean indicating whether to mask feed-forward network. Defaults to False. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
	None.

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def __init__(self, config: CpmBeeConfig, mask_att: bool = False, mask_ffn: bool = False):
    """
    __init__

    Initializes a CpmBeeTransformerBlock instance.

    Args:
        self: The instance of the CpmBeeTransformerBlock class.
        config (CpmBeeConfig): An instance of the CpmBeeConfig class containing configuration parameters.
        mask_att (bool, optional): A boolean indicating whether to mask attention. Defaults to False.
        mask_ffn (bool, optional): A boolean indicating whether to mask feed-forward network. Defaults to False.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.mask_att = mask_att
    self.mask_ffn = mask_ffn

    if not self.mask_att:
        self.self_att = CpmBeeSelfAttentionBlock(config)
    if not self.mask_ffn:
        self.ffn = CpmBeeFFNBlock(config)

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeTransformerBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)` ¶

PARAMETER	DESCRIPTION
`hidden_states`	Input to the layer of shape `(batch, seq_len, dim_model)` TYPE: `mindspore.Tensor`
`attention_mask`	Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)` TYPE: `mindspore.Tensor`
`position_bias`	Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)` TYPE: `mindspore.Tensor` DEFAULT: `None`
`output_attentions`	Whether or not to return the attentions tensors of all attention layers. TYPE: `bool`, optional DEFAULT: `False`
`past_key_values`	Cached past key and value projection states TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor])`, optional DEFAULT: `None`
`use_cache`	If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). TYPE: `bool`, optional DEFAULT: `None`

Source code in mindnlp\transformers\models\cpmbee\modeling_cpmbee.py

def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor`):
            Input to the layer of shape `(batch, seq_len, dim_model)`
        attention_mask (`mindspore.Tensor`):
            Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
        position_bias (`mindspore.Tensor`):
            Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
            Cached past key and value projection states
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    if not self.mask_att:
        hidden_states = self.self_att(
            hidden_states,
            attention_mask=attention_mask,
            position_bias=position_bias,
            output_attentions=output_attentions,
            past_key_values=past_key_values,
            use_cache=use_cache,
        )

        hidden_states, attn_weights, current_key_value = hidden_states
    else:
        attn_weights, current_key_value = None, (None, None)

    if not self.mask_ffn:
        hidden_states = self.ffn(hidden_states)

    return hidden_states, attn_weights, current_key_value

cpmbee

mindnlp.transformers.models.cpmbee.configuration_cpmbee ¶

mindnlp.transformers.models.cpmbee.configuration_cpmbee.CpmBeeConfig ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.bod_token_id property ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.eod_token_id property ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.newline_id property ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.vocab_size: int property ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.__call__(text, *args, **kwargs) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.__init__(vocab_file, bos_token='<s>', eos_token='</s>', line_token='\n', space_token=' ', unk_token='<unk>', mask_token='<mask>', pad_token='<pad>', padding_side='left', **kwargs) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.__len__() ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.check(token) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.convert_data_to_id(data, prev_ext_states=None, shuffle_answer=True, max_depth=8) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.convert_tokens_to_string(tokens) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.get_piece(text) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.get_vocab() ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.prepare_for_finetune(data_list, max_length=2048) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.save_vocabulary(save_directory, filename_prefix=None) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.tokenize(text, **kwargs) ¶

mindnlp.transformers.models.cpmbee.tokenization_cpmbee.rel_to_bucket(n_up, n_down, max_depth=8) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention.forward(hidden_q, hidden_kv, attention_mask, position_bias, output_attentions=False, past_key_values=None, use_cache=None) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamHypotheses ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamHypotheses.add(hyp, sum_logprobs, beam_indices=None) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.__init__(batch_size, num_beams, length_penalty=1.0, do_early_stopping=False, num_beam_hyps_to_keep=1, num_beam_groups=1, max_length=None, **model_kwargs) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.apply_repetition_penalty(logits, batch_size, num_beams, prev_output_tokens, repetition_penalty, start_idx=None, end_idx=None, window_size=None) staticmethod ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.finalize() ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias.forward(query_pos, key_pos, rel_buckets) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT.forward(hidden_states) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.forward(ids, ids_sub) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.projection(x, ext_table=None) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder.forward(hidden_states, attention_mask, position_bias, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock.forward(hidden_states) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward.forward(hidden_states) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.adjust_logits_during_generation(logits, batch_size, beam_size, vocab_size, ext_table_ids, **model_kwargs) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.generate(data_list, tokenizer, **kwargs) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.get_input_embeddings() ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.get_output_embeddings() ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.prepare_inputs_for_generation(input_ids, batch_size, beam_scorer=None, input_id_subs=None, input_pos=None, segment_ids=None, batch_ext_table_ids=None, batch_ext_table_sub=None, other_info=None, **model_kwargs) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.set_input_embeddings(embeddings) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.set_output_embeddings(new_embeddings) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm.forward(hidden_states) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear.__init__(dim_in, dim_out, dtype) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear.forward(x) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.get_input_embeddings() ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.set_input_embeddings(embeddings, **kwargs) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput.forward(hidden_states, input_tensor) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeePreTrainedModel ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding.forward(x, x_pos) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock.__init__(config) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None) ¶

mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeTransformerBlock ¶

`mindnlp.transformers.models.cpmbee.configuration_cpmbee` ¶

`mindnlp.transformers.models.cpmbee.configuration_cpmbee.CpmBeeConfig` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.bod_token_id` `property` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.eod_token_id` `property` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.newline_id` `property` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.vocab_size: int` `property` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.call(text, *args, **kwargs)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.init(vocab_file, bos_token='<s>', eos_token='</s>', line_token='\n', space_token=' ', unk_token='<unk>', mask_token='<mask>', pad_token='<pad>', padding_side='left', **kwargs)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.len()` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.check(token)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.convert_data_to_id(data, prev_ext_states=None, shuffle_answer=True, max_depth=8)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.convert_tokens_to_string(tokens)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.get_piece(text)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.get_vocab()` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.prepare_for_finetune(data_list, max_length=2048)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.save_vocabulary(save_directory, filename_prefix=None)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.CpmBeeTokenizer.tokenize(text, **kwargs)` ¶

`mindnlp.transformers.models.cpmbee.tokenization_cpmbee.rel_to_bucket(n_up, n_down, max_depth=8)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeAttention.forward(hidden_q, hidden_kv, attention_mask, position_bias, output_attentions=False, past_key_values=None, use_cache=None)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamHypotheses` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamHypotheses.add(hyp, sum_logprobs, beam_indices=None)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.init(batch_size, num_beams, length_penalty=1.0, do_early_stopping=False, num_beam_hyps_to_keep=1, num_beam_groups=1, max_length=None, **model_kwargs)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.apply_repetition_penalty(logits, batch_size, num_beams, prev_output_tokens, repetition_penalty, start_idx=None, end_idx=None, window_size=None)` `staticmethod` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBeamSearchScorer.finalize()` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeBucketPositionBias.forward(query_pos, key_pos, rel_buckets)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeDenseGatedACT.forward(hidden_states)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.forward(ids, ids_sub)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEmbeddingExt.projection(x, ext_table=None)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeEncoder.forward(hidden_states, attention_mask, position_bias, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFFNBlock.forward(hidden_states)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeFeedForward.forward(hidden_states)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.adjust_logits_during_generation(logits, batch_size, beam_size, vocab_size, ext_table_ids, **model_kwargs)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.generate(data_list, tokenizer, **kwargs)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.get_input_embeddings()` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.get_output_embeddings()` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.prepare_inputs_for_generation(input_ids, batch_size, beam_scorer=None, input_id_subs=None, input_pos=None, segment_ids=None, batch_ext_table_ids=None, batch_ext_table_sub=None, other_info=None, **model_kwargs)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.set_input_embeddings(embeddings)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeForCausalLM.set_output_embeddings(new_embeddings)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLayerNorm.forward(hidden_states)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear.init(dim_in, dim_out, dtype)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeLinear.forward(x)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.get_input_embeddings()` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeModel.set_input_embeddings(embeddings, **kwargs)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeOutput.forward(hidden_states, input_tensor)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeePreTrainedModel` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeRotaryEmbedding.forward(x, x_pos)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock.init(config)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeSelfAttentionBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeTransformerBlock` ¶

`mindnlp.transformers.models.cpmbee.modeling_cpmbee.CpmBeeTransformerBlock.init(config, mask_att=False, mask_ffn=False)` ¶