modeling_utils

`mindnlp.transformers.modeling_utils.PreTrainedModel` ¶

Bases: Module, ModuleUtilsMixin, GenerationMixin, PeftAdapterMixin

Base class for all models.

[PreTrainedModel] takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a few methods common to all models to:

- resize the input embeddings,
- prune heads in the self-attention heads.

Class attributes (overridden by derived classes):

- **config_class** ([`PretrainedConfig`]) -- A subclass of [`PretrainedConfig`] to use as configuration class
  for this model architecture.
- **load_tf_weights** (`Callable`) -- A python *method* for loading a TensorFlow checkpoint in a PyTorch model,
  taking as arguments:

    - **model** ([`PreTrainedModel`]) -- An instance of the model on which to load the TensorFlow checkpoint.
    - **config** ([`PreTrainedConfig`]) -- An instance of the configuration associated to the model.
    - **path** (`str`) -- A path to the TensorFlow checkpoint.

- **base_model_prefix** (`str`) -- A string indicating the attribute associated to the base model in derived
  classes of the same architecture adding modules on top of the base model.
- **is_parallelizable** (`bool`) -- A flag indicating whether this model supports model parallelization.
- **main_input_name** (`str`) -- The name of the principal input to the model (often `input_ids` for NLP
  models, `pixel_values` for vision models and `input_values` for speech models).

Source code in mindnlp\transformers\modeling_utils.py

class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PeftAdapterMixin):
    r"""
    Base class for all models.

    [`PreTrainedModel`] takes care of storing the configuration of the models and handles methods for loading,
    downloading and saving models as well as a few methods common to all models to:

        - resize the input embeddings,
        - prune heads in the self-attention heads.

    Class attributes (overridden by derived classes):

        - **config_class** ([`PretrainedConfig`]) -- A subclass of [`PretrainedConfig`] to use as configuration class
          for this model architecture.
        - **load_tf_weights** (`Callable`) -- A python *method* for loading a TensorFlow checkpoint in a PyTorch model,
          taking as arguments:

            - **model** ([`PreTrainedModel`]) -- An instance of the model on which to load the TensorFlow checkpoint.
            - **config** ([`PreTrainedConfig`]) -- An instance of the configuration associated to the model.
            - **path** (`str`) -- A path to the TensorFlow checkpoint.

        - **base_model_prefix** (`str`) -- A string indicating the attribute associated to the base model in derived
          classes of the same architecture adding modules on top of the base model.
        - **is_parallelizable** (`bool`) -- A flag indicating whether this model supports model parallelization.
        - **main_input_name** (`str`) -- The name of the principal input to the model (often `input_ids` for NLP
          models, `pixel_values` for vision models and `input_values` for speech models).
    """

    config_class = None
    base_model_prefix = ""
    main_input_name = "input_ids"
    model_tags = None

    _auto_class = None
    _no_split_modules = None
    _skip_keys_device_placement = None
    _keep_in_fp32_modules = None

    # a list of `re` patterns of `state_dict` keys that should be removed from the list of missing
    # keys we find (keys inside the model but not in the checkpoint) and avoid unnecessary warnings.
    _keys_to_ignore_on_load_missing = None
    # a list of `re` patterns of `state_dict` keys that should be removed from the list of
    # unexpected keys we find (keys inside the checkpoint but not the model) and avoid unnecessary
    # warnings.
    _keys_to_ignore_on_load_unexpected = None
    # a list of `state_dict` keys to ignore when saving the model (useful for keys that aren't
    # trained, but which are either deterministic or tied variables)
    _keys_to_ignore_on_save = None
    # a list of `state_dict` keys that are potentially tied to another key in the state_dict.
    _tied_weights_keys = None

    is_parallelizable = False
    supports_gradient_checkpointing = False
    _is_stateful = False

    # Flash Attention 2 support
    _supports_flash_attn_2 = False

    # SDPA support
    _supports_sdpa = False

    # Has support for a `Cache` instance as `past_key_values`? Does it support a `StaticCache`?
    _supports_cache_class = False
    _supports_static_cache = False

    # Has support for a `QuantoQuantizedCache` instance as `past_key_values`
    _supports_quantized_cache = False

    @property
    def dummy_inputs(self) -> Dict[str, mindspore.Tensor]:
        """
        `Dict[str, mindspore.Tensor]`: Dummy inputs to do a forward pass in the network.
        """
        return {"input_ids": mindspore.tensor(DUMMY_INPUTS)}

    @property
    def framework(self) -> str:
        """
        :str: Identifies that this is a PyTorch model.
        """
        return "ms"

    def __init__(self, config: PretrainedConfig, *inputs, **kwargs):
        super().__init__()
        if not isinstance(config, PretrainedConfig):
            raise ValueError(
                f"Parameter config in `{self.__class__.__name__}(config)` should be an instance of class "
                "`PretrainedConfig`. To create a model from a pretrained model use "
                f"`model = {self.__class__.__name__}.from_pretrained(PRETRAINED_MODEL_NAME)`"
            )
        # Save config and origin of the pretrained weights if given in model
        config = self._autoset_attn_implementation(
            config, ms_dtype=get_default_dtype(), check_device_map=False
        )
        self.config = config
        self.name_or_path = config.name_or_path
        self.warnings_issued = {}
        self.generation_config = GenerationConfig.from_model_config(config) if self.can_generate() else None
        # Overwrite the class attribute to make it an instance attribute, so models like
        # `InstructBlipForConditionalGeneration` can dynamically update it without modifying the class attribute
        # when a different component (e.g. language_model) is used.
        self._keep_in_fp32_modules = copy.copy(self.__class__._keep_in_fp32_modules)

    def post_init(self):
        """
        A method executed at the end of each Transformer model initialization, to execute code that needs the model's
        modules properly initialized (such as weight initialization).
        """
        self.init_weights()
        self._backward_compatibility_gradient_checkpointing()

    def dequantize(self):
        """
        Potentially dequantize the model in case it has been quantized by a quantization method that support
        dequantization.
        """
        hf_quantizer = getattr(self, "hf_quantizer", None)

        if hf_quantizer is None:
            raise ValueError("You need to first quantize your model in order to dequantize it")

        return hf_quantizer.dequantize(self)

    def _backward_compatibility_gradient_checkpointing(self):
        if self.supports_gradient_checkpointing and getattr(self.config, "gradient_checkpointing", False):
            self.gradient_checkpointing_enable()
            # Remove the attribute now that is has been consumed, so it's no saved in the config.
            delattr(self.config, "gradient_checkpointing")

    def add_model_tags(self, tags: Union[List[str], str]) -> None:
        r"""
        Add custom tags into the model that gets pushed to the Hugging Face Hub. Will
        not overwrite existing tags in the model.

        Args:
            tags (`Union[List[str], str]`):
                The desired tags to inject in the model

        Examples:

        ```python
        from transformers import AutoModel

        model = AutoModel.from_pretrained("google-bert/bert-base-cased")

        model.add_model_tags(["custom", "custom-bert"])

        # Push the model to your namespace with the name "my-custom-bert".
        model.push_to_hub("my-custom-bert")
        ```
        """
        if isinstance(tags, str):
            tags = [tags]

        if self.model_tags is None:
            self.model_tags = []

        for tag in tags:
            if tag not in self.model_tags:
                self.model_tags.append(tag)

    @classmethod
    def _from_config(cls, config, **kwargs):
        """
        All context managers that the model should be initialized under go here.

        Args:
            ms_dtype (`mindspore.dtype.TensorType`, *optional*):
                Override the default `mindspore.dtype.TensorType` and load the model under this dtype.
        """
        ms_dtype = kwargs.pop("ms_dtype", None)
        use_flash_attention_2 = kwargs.pop("use_flash_attention_2", False)

        # override default dtype if needed
        dtype_orig = None
        if ms_dtype is not None:
            dtype_orig = cls._set_default_ms_dtype(ms_dtype)

        config = copy.deepcopy(config)  # We do not want to modify the config inplace in _from_config.
        config._attn_implementation = kwargs.pop("attn_implementation", None)
        config = cls._autoset_attn_implementation(
            config,
            use_flash_attention_2=use_flash_attention_2,
            check_device_map=False,
            ms_dtype=ms_dtype,
        )

        model = cls(config, **kwargs)

        # restore default dtype if it was modified
        if dtype_orig is not None:
            set_default_dtype(dtype_orig)

        return model

    @classmethod
    def _autoset_attn_implementation(
        cls,
        config,
        use_flash_attention_2: bool = False,
        ms_dtype: Optional[mindspore.dtype.TensorType] = None,
        device_map: Optional[Union[str, Dict[str, int]]] = None,
        check_device_map: bool = True,
    ):
        """
        Automatically checks and dispatches to a default attention implementation. In order of priority:
            1. An implementation specified in `config._attn_implementation` (due for example to the argument attn_implementation="sdpa" in from_pretrained).
            2. DEPRECATED: if use_flash_attention_2 is set to `True` and `flash_attn` is available, flash attention. (`LlamaFlashAttention` for example)
            3. SDPA implementation, if available and supported by the model type. (`LlamaSdpaAttention` for example)
            4. The default model's implementation otherwise (`LlamaAttention` for example) .
        """
        # Here we use config._attn_implementation_internal to check whether the attention implementation was explicitely set by the user.
        # The property `PretrainedConfig._attn_implementation` is never `None`, for backward compatibility (always fall back on "eager").
        # The `hasattr` here is used as some Transformers tests for some reason do not call PretrainedConfig __init__ (e.g. test_no_super_init_config_and_model)
        requested_attn_implementation = None
        if hasattr(config, "_attn_implementation_internal") and config._attn_implementation_internal is not None:
            if config._attn_implementation != "flash_attention_2" and use_flash_attention_2:
                raise ValueError(
                    f'Both attn_implementation="{config._attn_implementation}" and `use_flash_attention_2=True` were used when loading the model, which are not compatible.'
                    ' We recommend to just use `attn_implementation="flash_attention_2"` when loading the model.'
                )

            if config._attn_implementation not in ["eager", "sdpa", "flash_attention_2"]:
                message = f'Specified `attn_implementation="{config._attn_implementation}"` is not supported. The only possible arguments are `attn_implementation="eager"` (manual attention implementation)'
                if cls._supports_flash_attn_2:
                    message += ', `"attn_implementation=flash_attention_2"` (implementation using flash attention 2)'
                if cls._supports_sdpa:
                    message += ', `"attn_implementation=sdpa"` (implementation using nn.functional.scaled_dot_product_attention)'
                raise ValueError(message + ".")

            # If a config is passed with a preset attn_implementation, we skip the automatic dispatch and use the user-provided config, with hard checks that the requested attention implementation is available.
            requested_attn_implementation = config._attn_implementation_internal

        config._attn_implementation = "eager"

        return config

    @classmethod
    def _set_default_ms_dtype(cls, dtype: mindspore.dtype.TensorType) -> mindspore.dtype.TensorType:
        """
        Change the default dtype and return the previous one. This is needed when wanting to instantiate the model
        under specific dtype.

        Args:
            dtype (`mindspore.dtype.TensorType`):
                a floating dtype to set to.

        Returns:
            `mindspore.dtype.TensorType`: the original `dtype` that can be used to restore `torch.set_default_dtype(dtype)` if it was
            modified. If it wasn't, returns `None`.

        Note `set_default_dtype` currently only works with floating-point types and asserts if for example,
        `torch.int64` is passed. So if a non-float `dtype` is passed this functions will throw an exception.
        """
        if not isinstance(dtype, (typing.Float, typing.BFloat)):
            raise ValueError(
                f"Can't instantiate {cls.__name__} model under dtype={dtype} since it is not a floating point dtype"
            )

        logger.info(f"Instantiating {cls.__name__} model under default dtype {dtype}.")
        dtype_orig = get_default_dtype()
        set_default_dtype(dtype)
        return dtype_orig

    @property
    def base_model(self) -> nn.Module:
        """
        `nn.Module`: The main body of the model.
        """
        return getattr(self, self.base_model_prefix, self)

    @classmethod
    def can_generate(cls) -> bool:
        """
        Returns whether this model can generate sequences with `.generate()`.

        Returns:
            `bool`: Whether this model can generate sequences with `.generate()`.
        """
        # Detects whether `prepare_inputs_for_generation` has been overwritten, which is a requirement for generation.
        # Alternativelly, the model can also have a custom `generate` function.
        if "GenerationMixin" in str(cls.prepare_inputs_for_generation) and "GenerationMixin" in str(cls.generate):
            return False
        return True

    def enable_input_require_grads(self):
        """
        Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping
        the model weights fixed.
        """

        def make_inputs_require_grads(module, input, output):
            output.requires_grad = True

        self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads)

    def disable_input_require_grads(self):
        """
        Removes the `_require_grads_hook`.
        """
        self._require_grads_hook.remove()

    def get_input_embeddings(self) -> nn.Module:
        """
        Returns the model's input embeddings.

        Returns:
            `nn.Module`: A torch module mapping vocabulary to hidden states.
        """
        base_model = getattr(self, self.base_model_prefix, self)
        if base_model is not self:
            return base_model.get_input_embeddings()
        else:
            raise NotImplementedError

    def set_input_embeddings(self, value: nn.Module):
        """
        Set model's input embeddings.

        Args:
            value (`nn.Module`): A module mapping vocabulary to hidden states.
        """
        base_model = getattr(self, self.base_model_prefix, self)
        if base_model is not self:
            base_model.set_input_embeddings(value)
        else:
            raise NotImplementedError

    def get_output_embeddings(self) -> nn.Module:
        """
        Returns the model's output embeddings.

        Returns:
            `nn.Module`: A torch module mapping hidden states to vocabulary.
        """
        return None  # Overwrite for models with output embeddings

    def _init_weights(self, module):
        """
        Initialize the weights. This method should be overridden by derived class and is
        the only initialization method that will be called when loading a checkpoint
        using `from_pretrained`. Any attempt to initialize outside of this function
        will be useless as the nn.init function are all replaced with skip.
        """

    def _initialize_weights(self, module):
        """
        Initialize the weights if they are not already initialized.
        """
        if getattr(module, "_is_initialized", False):
            return
        self._init_weights(module)
        module._is_initialized = True

    def tie_weights(self):
        """
        Tie the weights between the input embeddings and the output embeddings.

        """
        if getattr(self.config, "tie_word_embeddings", True):
            output_embeddings = self.get_output_embeddings()
            if output_embeddings is not None:
                self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())

        if getattr(self.config, "is_encoder_decoder", False) and getattr(self.config, "tie_encoder_decoder", False):
            if hasattr(self, self.base_model_prefix):
                self = getattr(self, self.base_model_prefix) # pylint: disable=self-cls-assignment
            tied_weights = self._tie_encoder_decoder_weights(
                self.encoder, self.decoder, self.base_model_prefix, "encoder"
            )
            # Setting a dynamic variable instead of `_tied_weights_keys` because it's a class
            # attributed not an instance member, therefore modifying it will modify the entire class
            # Leading to issues on subsequent calls by different tests or subsequent calls.
            self._dynamic_tied_weights_keys = tied_weights

        for module in self.modules():
            if hasattr(module, "_tie_weights"):
                module._tie_weights()

    @staticmethod
    def _tie_encoder_decoder_weights(
        encoder: nn.Module, decoder: nn.Module, base_model_prefix: str, base_encoder_name: str
    ):
        uninitialized_encoder_weights: List[str] = []
        tied_weights: List[str] = []
        if decoder.__class__ != encoder.__class__:
            logger.info(
                f"{decoder.__class__} and {encoder.__class__} are not equal. In this case make sure that all encoder"
                " weights are correctly initialized."
            )

        def tie_encoder_to_decoder_recursively(
            decoder_pointer: nn.Module,
            encoder_pointer: nn.Module,
            module_name: str,
            base_encoder_name: str,
            uninitialized_encoder_weights: List[str],
            depth=0,
            total_decoder_name="",
            total_encoder_name="",
        ):
            assert isinstance(decoder_pointer, nn.Module) and isinstance(
                encoder_pointer, nn.Module
            ), f"{decoder_pointer} and {encoder_pointer} have to be of type nn.Module"
            if hasattr(decoder_pointer, "weight"):
                assert hasattr(encoder_pointer, "weight")
                encoder_pointer.weight = decoder_pointer.weight
                tied_weights.append(f"{base_encoder_name}{total_encoder_name}.weight")
                if hasattr(decoder_pointer, "bias"):
                    assert hasattr(encoder_pointer, "bias")
                    tied_weights.append(f"{base_encoder_name}{total_encoder_name}.bias")
                    encoder_pointer.bias = decoder_pointer.bias
                return

            encoder_modules = encoder_pointer._modules
            decoder_modules = decoder_pointer._modules
            if len(decoder_modules) > 0:
                assert (
                    len(encoder_modules) > 0
                ), f"Encoder module {encoder_pointer} does not match decoder module {decoder_pointer}"

                all_encoder_weights = {module_name + "/" + sub_name for sub_name in encoder_modules.keys()}
                encoder_layer_pos = 0
                for name, module in decoder_modules.items():
                    if name.isdigit():
                        encoder_name = str(int(name) + encoder_layer_pos)
                        decoder_name = name
                        if not isinstance(decoder_modules[decoder_name], type(encoder_modules[encoder_name])) and len(
                            encoder_modules
                        ) != len(decoder_modules):
                            # this can happen if the name corresponds to the position in a list module list of layers
                            # in this case the decoder has added a cross-attention that the encoder does not have
                            # thus skip this step and subtract one layer pos from encoder
                            encoder_layer_pos -= 1
                            continue
                    elif name not in encoder_modules:
                        continue
                    elif depth > 500:
                        raise ValueError(
                            "Max depth of recursive function `tie_encoder_to_decoder` reached. It seems that there is"
                            " a circular dependency between two or more `nn.Modules` of your model."
                        )
                    else:
                        decoder_name = encoder_name = name
                    tie_encoder_to_decoder_recursively(
                        decoder_modules[decoder_name],
                        encoder_modules[encoder_name],
                        module_name + "/" + name,
                        base_encoder_name,
                        uninitialized_encoder_weights,
                        depth=depth + 1,
                        total_encoder_name=f"{total_encoder_name}.{encoder_name}",
                        total_decoder_name=f"{total_decoder_name}.{decoder_name}",
                    )
                    all_encoder_weights.remove(module_name + "/" + encoder_name)

                uninitialized_encoder_weights += list(all_encoder_weights)

        # tie weights recursively
        tie_encoder_to_decoder_recursively(
            decoder, encoder, base_model_prefix, base_encoder_name, uninitialized_encoder_weights
        )

        if len(uninitialized_encoder_weights) > 0:
            logger.warning(
                f"The following encoder weights were not tied to the decoder {uninitialized_encoder_weights}"
            )
        return tied_weights

    def _tie_or_clone_weights(self, output_embeddings, input_embeddings):
        """Tie or clone module weights"""
        # output_embeddings.weight = nn.Parameter(input_embeddings.weight.clone())
        # # else:
        output_embeddings.weight = input_embeddings.weight
        if getattr(output_embeddings, "bias", None) is not None:
            if output_embeddings.weight.shape[0] - output_embeddings.bias.shape[0] > 0:
                new_bias = nn.functional.pad(
                    output_embeddings.bias,
                    (
                        0,
                        output_embeddings.weight.shape[0] - output_embeddings.bias.shape[0],
                    ),
                    "constant",
                    0,
                )
            else:
                new_bias = output_embeddings.bias[:output_embeddings.weight.shape[0]]
            output_embeddings.bias.assign_value(new_bias)

        if hasattr(output_embeddings, "out_features") and hasattr(input_embeddings, "num_embeddings"):
            output_embeddings.out_features = input_embeddings.num_embeddings

    def _get_no_split_modules(self, device_map: str):
        """
        Get the modules of the model that should not be spit when using device_map. We iterate through the modules to
        get the underlying `_no_split_modules`.

        Args:
            device_map (`str`):
                The device map value. Options are ["auto", "balanced", "balanced_low_0", "sequential"]

        Returns:
            `List[str]`: List of modules that should not be split
        """
        _no_split_modules = set()
        modules_to_check = [self]
        while len(modules_to_check) > 0:
            module = modules_to_check.pop(-1)
            # if the module does not appear in _no_split_modules, we also check the children
            if module.__class__.__name__ not in _no_split_modules:
                if isinstance(module, PreTrainedModel):
                    if module._no_split_modules is None:
                        raise ValueError(
                            f"{module.__class__.__name__} does not support `device_map='{device_map}'`. To implement support, the model "
                            "class needs to implement the `_no_split_modules` attribute."
                        )
                    else:
                        _no_split_modules = _no_split_modules | set(module._no_split_modules)
                modules_to_check += list(module.children())
        return list(_no_split_modules)

    def resize_token_embeddings(
        self, new_num_tokens: Optional[int] = None, pad_to_multiple_of: Optional[int] = None
    ) -> nn.Embedding:
        """
        Resizes input token embeddings matrix of the model if `new_num_tokens != config.vocab_size`.

        Takes care of tying weights embeddings afterwards if the model class has a `tie_weights()` method.

        Arguments:
            new_num_tokens (`int`, *optional*):
                The new number of tokens in the embedding matrix. Increasing the size will add newly initialized
                vectors at the end. Reducing the size will remove vectors from the end. If not provided or `None`, just
                returns a pointer to the input tokens `nn.Embedding` module of the model without doing anything.
            pad_to_multiple_of (`int`, *optional*):
                If set will pad the embedding matrix to a multiple of the provided value.If `new_num_tokens` is set to
                `None` will just pad the embedding to a multiple of `pad_to_multiple_of`.

                This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
                `>= 7.5` (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more
                details about this, or help on choosing the correct value for resizing, refer to this guide:
                https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

        Return:
            `nn.Embedding`: Pointer to the input tokens Embeddings Module of the model.
        """
        model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
        if new_num_tokens is None and pad_to_multiple_of is None:
            return model_embeds

        # Since we are basically resuing the same old embeddings with new weight values, gathering is required
        vocab_size = model_embeds.weight.shape[0]

        # Update base model and current model config
        if hasattr(self.config, "text_config"):
            self.config.text_config.vocab_size = vocab_size
        else:
            self.config.vocab_size = vocab_size
        self.vocab_size = vocab_size

        # Tie weights again if needed
        self.tie_weights()

        return model_embeds

    def _resize_token_embeddings(self, new_num_tokens, pad_to_multiple_of=None):
        old_embeddings = self.get_input_embeddings()
        new_embeddings = self._get_resized_embeddings(old_embeddings, new_num_tokens, pad_to_multiple_of)
        # if hasattr(old_embeddings, "_hf_hook"):
        #     hook = old_embeddings._hf_hook
        #     add_hook_to_module(new_embeddings, hook)
        old_embeddings_requires_grad = old_embeddings.weight.requires_grad
        new_embeddings.requires_grad_(old_embeddings_requires_grad)
        self.set_input_embeddings(new_embeddings)

        # Update new_num_tokens with the actual size of new_embeddings
        if pad_to_multiple_of is not None:
            new_num_tokens = new_embeddings.weight.shape[0]

        # if word embeddings are not tied, make sure that lm head is resized as well
        if self.get_output_embeddings() is not None and not self.config.tie_word_embeddings:
            old_lm_head = self.get_output_embeddings()
            if isinstance(old_lm_head, nn.Embedding):
                new_lm_head = self._get_resized_embeddings(old_lm_head, new_num_tokens)
            else:
                new_lm_head = self._get_resized_lm_head(old_lm_head, new_num_tokens)
            # if hasattr(old_lm_head, "_hf_hook"):
            #     hook = old_lm_head._hf_hook
            #     add_hook_to_module(new_lm_head, hook)
            old_lm_head_requires_grad = old_lm_head.weight.requires_grad
            new_lm_head.requires_grad_(old_lm_head_requires_grad)
            self.set_output_embeddings(new_lm_head)

        return self.get_input_embeddings()

    def _get_resized_embeddings(
        self,
        old_embeddings: nn.Embedding,
        new_num_tokens: Optional[int] = None,
        pad_to_multiple_of: Optional[int] = None,
    ) -> nn.Embedding:
        """
        Build a resized Embedding Module from a provided token Embedding Module. Increasing the size will add newly
        initialized vectors at the end. Reducing the size will remove vectors from the end

        Args:
            old_embeddings (`nn.Embedding`):
                Old embeddings to be resized.
            new_num_tokens (`int`, *optional*):
                New number of tokens in the embedding matrix.

                Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
                vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
                `nn.Embedding` module of the model without doing anything.
            pad_to_multiple_of (`int`, *optional*):
                If set will pad the embedding matrix to a multiple of the provided value. If `new_num_tokens` is set to
                `None` will just pad the embedding to a multiple of `pad_to_multiple_of`.

                This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
                `>= 7.5` (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more
                details about this, or help on choosing the correct value for resizing, refer to this guide:
                https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


        Return:
            `nn.Embedding`: Pointer to the resized Embedding Module or the old Embedding Module if
            `new_num_tokens` is `None`
        """

        if pad_to_multiple_of is not None:
            if not isinstance(pad_to_multiple_of, int):
                raise ValueError(
                    f"Asking to pad the embedding matrix to a multiple of `{pad_to_multiple_of}`, which is not and integer. Please make sure to pass an integer"
                )
            if new_num_tokens is None:
                new_num_tokens = old_embeddings.weight.shape[0]
            new_num_tokens = ((new_num_tokens + pad_to_multiple_of - 1) // pad_to_multiple_of) * pad_to_multiple_of
        else:
            logger.info(
                "You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding"
                f" dimension will be {new_num_tokens}. This might induce some performance reduction as *Tensor Cores* will not be available."
                " For more details about this, or help on choosing the correct value for resizing, refer to this guide:"
                " https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc"
            )

        if new_num_tokens is None:
            return old_embeddings

        old_num_tokens, old_embedding_dim = old_embeddings.weight.shape

        if old_num_tokens == new_num_tokens:
            return old_embeddings

        if not isinstance(old_embeddings, nn.Embedding):
            raise TypeError(
                f"Old embeddings are of type {type(old_embeddings)}, which is not an instance of {nn.Embedding}. You"
                " should either use a different resize function or make sure that `old_embeddings` are an instance of"
                f" {nn.Embedding}."
            )

        # Build new embeddings

        # When using DeepSpeed ZeRO-3, we shouldn't create new embeddings with DeepSpeed init
        # because the shape of the new embedding layer is used across various modeling files
        # as well as to update config vocab size. Shape will be 0 when using DeepSpeed init leading
        # to errors when training.
        new_embeddings = nn.Embedding(
            new_num_tokens,
            old_embedding_dim,
            dtype=old_embeddings.weight.dtype,
        )

        # initialize all new embeddings (in particular added tokens)
        self._init_weights(new_embeddings)

        # Copy token embeddings from the previous weights

        # numbers of tokens to copy
        n = min(old_num_tokens, new_num_tokens)

        new_embeddings.weight[:n, :] = old_embeddings.weight[:n, :]

        # Replace weights in old_embeddings and return to maintain the same embedding type.
        # This ensures correct functionality when a Custom Embedding class is passed as input.
        # The input and output embedding types remain consistent. (c.f. https://github.com/huggingface/transformers/pull/31979)

        old_embeddings.weight = new_embeddings.weight
        old_embeddings.num_embeddings = new_embeddings.weight.shape[0]
        if old_embeddings.padding_idx is not None and (new_num_tokens - 1) < old_embeddings.padding_idx:
            old_embeddings.padding_idx = None

        return old_embeddings

    def _get_resized_lm_head(
        self, old_lm_head: nn.Linear, new_num_tokens: Optional[int] = None, transposed: Optional[bool] = False
    ) -> nn.Linear:
        """
        Build a resized Linear Module from a provided old Linear Module. Increasing the size will add newly initialized
        vectors at the end. Reducing the size will remove vectors from the end

        Args:
            old_lm_head (`nn.Linear`):
                Old lm head liner layer to be resized.
            new_num_tokens (`int`, *optional*):
                New number of tokens in the linear matrix.

                Increasing the size will add newly initialized vectors at the end. Reducing the size will remove
                vectors from the end. If not provided or `None`, just returns a pointer to the input tokens
                `nn.Linear` module of the model without doing anything. transposed (`bool`, *optional*, defaults
                to `False`): Whether `old_lm_head` is transposed or not. If True `old_lm_head.shape` is `lm_head_dim,
                vocab_size` else `vocab_size, lm_head_dim`.

        Return:
            `nn.Linear`: Pointer to the resized Linear Module or the old Linear Module if `new_num_tokens` is
            `None`
        """
        if new_num_tokens is None:
            return old_lm_head

        is_quantized = hasattr(self, "hf_quantizer") and self.hf_quantizer is not None

        old_num_tokens, old_lm_head_dim = (
            old_lm_head.weight.shape if not transposed else old_lm_head.weight.t().shape
        )

        if old_num_tokens == new_num_tokens:
            return old_lm_head

        if not isinstance(old_lm_head, nn.Linear):
            raise TypeError(
                f"Old language model head is of type {type(old_lm_head)}, which is not an instance of {nn.Linear}. You"
                " should either use a different resize function or make sure that `old_lm_head` are an instance of"
                f" {nn.Linear}."
            )

        # Build new lm head
        new_lm_head_shape = (old_lm_head_dim, new_num_tokens) if not transposed else (new_num_tokens, old_lm_head_dim)
        has_new_lm_head_bias = old_lm_head.bias is not None

        # When using DeepSpeed ZeRO-3, we shouldn't create new embeddings with DeepSpeed init
        # because the shape of the new embedding layer is used across various modeling files
        # as well as to update config vocab size. Shape will be 0 when using DeepSpeed init leading
        # to errors when training.
        new_lm_head = nn.Linear(
            *new_lm_head_shape,
            bias=has_new_lm_head_bias,
            dtype=old_lm_head.weight.dtype,
        )

        # initialize new lm head (in particular added tokens)
        self._init_weights(new_lm_head)

        num_tokens_to_copy = min(old_num_tokens, new_num_tokens)

        self._copy_lm_head_original_to_resized(
            new_lm_head, old_lm_head, num_tokens_to_copy, transposed, has_new_lm_head_bias
        )

        return new_lm_head

    def _copy_lm_head_original_to_resized(
        self, new_lm_head, old_lm_head, num_tokens_to_copy, transposed, has_new_lm_head_bias
    ):
        # Copy old lm head weights to new lm head
        if not transposed:
            new_lm_head.weight[:num_tokens_to_copy, :] = old_lm_head.weight[:num_tokens_to_copy, :]
        else:
            new_lm_head.weight[:, :num_tokens_to_copy] = old_lm_head.weight[:, :num_tokens_to_copy]

        # Copy bias weights to new lm head
        if has_new_lm_head_bias:
            new_lm_head.bias[:num_tokens_to_copy] = old_lm_head.bias[:num_tokens_to_copy]

    def resize_position_embeddings(self, new_num_position_embeddings: int):
        raise NotImplementedError(
            f"`resize_position_embeddings` is not implemented for {self.__class__}`. To implement it, you should "
            f"overwrite this method in the class {self.__class__} in `modeling_{self.__class__.__module__}.py`"
        )

    def get_position_embeddings(self) -> Union[nn.Embedding, Tuple[nn.Embedding]]:
        raise NotImplementedError(
            f"`get_position_embeddings` is not implemented for {self.__class__}`. To implement it, you should "
            f"overwrite this method in the class {self.__class__} in `modeling_{self.__class__.__module__}.py`"
        )

    def init_weights(self):
        """
        If needed prunes and maybe initializes weights. If using a custom `PreTrainedModel`, you need to implement any
        initialization logic in `_init_weights`.
        """
        # Prune heads if needed
        if self.config.pruned_heads:
            self.prune_heads(self.config.pruned_heads)

        if _init_weights:
            # Initialize weights
            self.apply(self._initialize_weights)

            # Tie weights should be skipped when not initializing all weights
            # since from_pretrained(...) calls tie weights anyways
            self.tie_weights()

    def prune_heads(self, heads_to_prune: Dict[int, List[int]]):
        """
        Prunes heads of the base model.

        Arguments:
            heads_to_prune (`Dict[int, List[int]]`):
                Dictionary with keys being selected layer indices (`int`) and associated values being the list of heads
                to prune in said layer (list of `int`). For instance {1: [0, 2], 2: [2, 3]} will prune heads 0 and 2 on
                layer 1 and heads 2 and 3 on layer 2.
        """
        # save new sets of pruned heads as union of previously stored pruned heads and newly pruned heads
        for layer, heads in heads_to_prune.items():
            union_heads = set(self.config.pruned_heads.get(layer, [])) | set(heads)
            self.config.pruned_heads[layer] = list(union_heads)  # Unfortunately we have to store it as list for JSON

        self.base_model._prune_heads(heads_to_prune)

    def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
        """
        Activates gradient checkpointing for the current model.

        Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint
        activations".

        We pass the `__call__` method of the modules instead of `forward` because `__call__` attaches all the hooks of
        the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2

        Args:
            gradient_checkpointing_kwargs (dict, *optional*):
                Additional keyword arguments passed along to the `torch.utils.checkpoint.checkpoint` function.
        """
        if not self.supports_gradient_checkpointing:
            raise ValueError(f"{self.__class__.__name__} does not support gradient checkpointing.")

        if gradient_checkpointing_kwargs is None:
            gradient_checkpointing_kwargs = {"use_reentrant": True}

        if GENERATOR_SEED:
            gradient_checkpointing_func = mindspore.recompute
        else:
            def function(fn, *args, **kwargs):
                return fn(*args, **kwargs)

            gradient_checkpointing_func = function

        # For old GC format (transformers < 4.35.0) for models that live on the Hub
        # we will fall back to the overwritten `_set_gradient_checkpointing` method
        _is_using_old_format = "value" in inspect.signature(self._set_gradient_checkpointing).parameters

        if not _is_using_old_format:
            self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func)
        else:
            self.apply(partial(self._set_gradient_checkpointing, value=True))
            logger.warning(
                "You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it)."
                "Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model."
            )

        if getattr(self, "_hf_peft_config_loaded", False):
            # When using PEFT + gradient checkpointing + Trainer we need to make sure the input has requires_grad=True
            # we do it also on PEFT: https://github.com/huggingface/peft/blob/85013987aa82aa1af3da1236b6902556ce3e483e/src/peft/peft_model.py#L334
            # When training with PEFT, only LoRA layers will have requires grad set to True, but the output of frozen layers need to propagate
            # the gradients to make sure the gradient flows.
            self.enable_input_require_grads()

    def _set_gradient_checkpointing(self, enable: bool = True, gradient_checkpointing_func: Callable = None):
        is_gradient_checkpointing_set = False

        # Apply it on the top-level module in case the top-level modules supports it
        # for example, LongT5Stack inherits from `PreTrainedModel`.
        if hasattr(self, "gradient_checkpointing"):
            self._gradient_checkpointing_func = gradient_checkpointing_func
            self.gradient_checkpointing = enable
            is_gradient_checkpointing_set = True

        for module in self.modules():
            if hasattr(module, "gradient_checkpointing"):
                module._gradient_checkpointing_func = gradient_checkpointing_func
                module.gradient_checkpointing = enable
                is_gradient_checkpointing_set = True

        if not is_gradient_checkpointing_set:
            raise ValueError(
                f"{self.__class__.__name__} is not compatible with gradient checkpointing. Make sure all the architecture support it by setting a boolean attribute"
                " `gradient_checkpointing` to modules of the model that uses checkpointing."
            )

    def gradient_checkpointing_disable(self):
        """
        Deactivates gradient checkpointing for the current model.

        Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint
        activations".
        """
        if self.supports_gradient_checkpointing:
            # For old GC format (transformers < 4.35.0) for models that live on the Hub
            # we will fall back to the overwritten `_set_gradient_checkpointing` methid
            _is_using_old_format = "value" in inspect.signature(self._set_gradient_checkpointing).parameters
            if not _is_using_old_format:
                self._set_gradient_checkpointing(enable=False)
            else:
                logger.warning(
                    "You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it)."
                    "Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model."
                )
                self.apply(partial(self._set_gradient_checkpointing, value=False))

        if getattr(self, "_hf_peft_config_loaded", False):
            self.disable_input_require_grads()

    @property
    def is_gradient_checkpointing(self) -> bool:
        """
        Whether gradient checkpointing is activated for this model or not.

        Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint
        activations".
        """
        return any(hasattr(m, "gradient_checkpointing") and m.gradient_checkpointing for m in self.modules())

    def save_pretrained(
        self,
        save_directory: Union[str, os.PathLike],
        is_main_process: bool = True,
        state_dict: Optional[dict] = None,
        save_function: Callable = save_checkpoint,
        push_to_hub: bool = False,
        max_shard_size: Union[int, str] = "5GB",
        safe_serialization: bool = True,
        variant: Optional[str] = None,
        token: Optional[Union[str, bool]] = None,
        save_peft_format: bool = True,
        **kwargs,
    ):
        """
        Save a model and its configuration file to a directory, so that it can be re-loaded using the
        [`~PreTrainedModel.from_pretrained`] class method.

        Arguments:
            save_directory (`str` or `os.PathLike`):
                Directory to which to save. Will be created if it doesn't exist.
            is_main_process (`bool`, *optional*, defaults to `True`):
                Whether the process calling this is the main process or not. Useful when in distributed training like
                TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on
                the main process to avoid race conditions.
            state_dict (nested dictionary of `mindspore.Tensor`):
                The state dictionary of the model to save. Will default to `self.state_dict()`, but can be used to only
                save parts of the model or if special precautions need to be taken when recovering the state dictionary
                of a model (like when using model parallelism).
            save_function (`Callable`):
                The function to use to save the state dictionary. Useful on distributed training like TPUs when one
                need to replace `torch.save` by another method.
            push_to_hub (`bool`, *optional*, defaults to `False`):
                Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the
                repository you want to push to with `repo_id` (will default to the name of `save_directory` in your
                namespace).
            max_shard_size (`int` or `str`, *optional*, defaults to `"5GB"`):
                The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size
                lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`).
                We default it to 5GB in order for models to be able to run easily on free-tier google colab instances
                without CPU OOM issues.

                <Tip warning={true}>

                If a single weight of the model is bigger than `max_shard_size`, it will be in its own checkpoint shard
                which will be bigger than `max_shard_size`.

                </Tip>

            safe_serialization (`bool`, *optional*, defaults to `True`):
                Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).
            variant (`str`, *optional*):
                If specified, weights are saved in the format pytorch_model.<variant>.bin.
            token (`str` or `bool`, *optional*):
                The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
                the token generated when running `huggingface-cli login` (stored in `~/.huggingface`).
            save_peft_format (`bool`, *optional*, defaults to `True`):
                For backward compatibility with PEFT library, in case adapter weights are attached to the model, all
                keys of the state dict of adapters needs to be pre-pended with `base_model.model`. Advanced users can
                disable this behaviours by setting `save_peft_format` to `False`.
            kwargs (`Dict[str, Any]`, *optional*):
                Additional key word arguments passed along to the [`~utils.PushToHubMixin.push_to_hub`] method.
        """
        use_auth_token = kwargs.pop("use_auth_token", None)
        ignore_metadata_errors = kwargs.pop("ignore_metadata_errors", False)

        if use_auth_token is not None:
            warnings.warn(
                "The `use_auth_token` argument is deprecated. Please use `token` instead.",
                FutureWarning,
            )
            if token is not None:
                raise ValueError(
                    "`token` and `use_auth_token` are both specified. Please set only the argument `token`."
                )
            token = use_auth_token

        if token is not None:
            kwargs["token"] = token

        _hf_peft_config_loaded = getattr(self, "_hf_peft_config_loaded", False)

        if "save_config" in kwargs:
            warnings.warn(
                "`save_config` is deprecated. Use `is_main_process` instead."
            )
            is_main_process = kwargs.pop("save_config")
        if safe_serialization and not is_safetensors_available():
            raise ImportError("`safe_serialization` requires the `safetensors library: `pip install safetensors`.")

        if os.path.isfile(save_directory):
            logger.error(f"Provided path ({save_directory}) should be a directory, not a file")
            return

        os.makedirs(save_directory, exist_ok=True)

        if push_to_hub:
            commit_message = kwargs.pop("commit_message", None)
            repo_id = kwargs.pop("repo_id", save_directory.split(os.path.sep)[-1])
            repo_id = self._create_repo(repo_id, **kwargs)
            files_timestamps = self._get_files_timestamps(save_directory)

        # Only save the model itself if we are using distributed training
        model_to_save = unwrap_model(self)

        # save the string version of dtype to the config, e.g. convert mindspore.float32 => "float32"
        # we currently don't use this setting automatically, but may start to use with v5
        dtype = get_parameter_dtype(model_to_save)
        model_to_save.config.ms_dtype = str(dtype).lower()

        # Attach architecture to the config
        model_to_save.config.architectures = [model_to_save.__class__.__name__]

        # Save the config
        if is_main_process:
            if not _hf_peft_config_loaded:
                model_to_save.config.save_pretrained(save_directory)
            if self.can_generate():
                # generation config built from the model config + the model config holds generation kwargs -> generate
                # may revert to legacy behavior if the two don't match
                if (
                    model_to_save.generation_config._from_model_config
                    and model_to_save.config._has_non_default_generation_parameters()
                ):
                    new_generation_config = GenerationConfig.from_model_config(model_to_save.config)
                    if new_generation_config != model_to_save.generation_config:
                        logger.warning(
                            "Your generation config was originally created from the model config, but the model "
                            "config has changed since then. Unless you pass the `generation_config` argument to this "
                            "model's `generate` calls, they will revert to the legacy behavior where the base "
                            "`generate` parameterization is loaded from the model config instead. "
                            "To avoid this behavior and this warning, we recommend you to overwrite the generation "
                            "config model attribute before calling the model's `save_pretrained`, preferably also "
                            "removing any generation kwargs from the model config. This warning will be raised to an "
                            "exception in v4.41."
                        )
                model_to_save.generation_config.save_pretrained(save_directory)

            if _hf_peft_config_loaded:
                logger.info(
                    "Detected adapters on the model, saving the model in the PEFT format, only adapter weights will be saved."
                )
                state_dict = model_to_save.get_adapter_state_dict()

                if save_peft_format:
                    logger.info(
                        "To match the expected format of the PEFT library, all keys of the state dict of adapters will be pre-pended with `base_model.model`."
                    )
                    peft_state_dict = {}
                    for key, value in state_dict.items():
                        peft_state_dict[f"base_model.model.{key}"] = value
                    state_dict = peft_state_dict

                active_adapter = self.active_adapters()

                if len(active_adapter) > 1:
                    raise ValueError(
                        "Multiple active adapters detected, saving multiple active adapters is not supported yet. You can save adapters separately one by one "
                        "by iteratively calling `model.set_adapter(adapter_name)` then `model.save_pretrained(...)`"
                    )
                active_adapter = active_adapter[0]

                current_peft_config = self.peft_config[active_adapter]
                current_peft_config.save_pretrained(save_directory)

        # for offloaded modules
        module_map = {}

        # Save the model
        if state_dict is None:
            # if any model parameters are offloaded, make module map
            if (
                hasattr(self, "hf_device_map")
                and len(set(self.hf_device_map.values())) > 1
                and ("cpu" in self.hf_device_map.values() or "disk" in self.hf_device_map.values())
            ):
                warnings.warn(
                    "Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (5GB default)"
                )
                for name, module in model_to_save.named_modules():
                    if name == "":
                        continue
                    module_state_dict = module.state_dict()

                    for key in module_state_dict:
                        module_map[name + f".{key}"] = module
            state_dict = model_to_save.state_dict()

        # Handle the case where some state_dict keys shouldn't be saved
        if self._keys_to_ignore_on_save is not None:
            for ignore_key in self._keys_to_ignore_on_save:
                if ignore_key in state_dict.keys():
                    del state_dict[ignore_key]

        # Shard the model if it is too big.
        if not _hf_peft_config_loaded:
            weights_name = SAFE_WEIGHTS_NAME if safe_serialization else WEIGHTS_NAME
            weights_name = _add_variant(weights_name, variant)
        else:
            weights_name = ADAPTER_SAFE_WEIGHTS_NAME if safe_serialization else ADAPTER_WEIGHTS_NAME

        filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
        state_dict_split = split_state_dict_into_shards(
            state_dict, filename_pattern=filename_pattern, max_shard_size=max_shard_size
        )
        # Save index if sharded
        index = None
        if state_dict_split.is_sharded:
            index = {
                "metadata": state_dict_split.metadata,
                "weight_map": state_dict_split.tensor_to_filename,
            }

        # Clean the folder from a previous save
        for filename in os.listdir(save_directory):
            full_filename = os.path.join(save_directory, filename)
            # If we have a shard file that is not going to be replaced, we delete it, but only from the main process
            # in distributed settings to avoid race conditions.
            weights_no_suffix = weights_name.replace(".bin", "").replace(".safetensors", "")

            # make sure that file to be deleted matches format of sharded file, e.g. pytorch_model-00001-of-00005
            filename_no_suffix = filename.replace(".bin", "").replace(".safetensors", "")
            reg = re.compile(r"(.*?)-\d{5}-of-\d{5}")

            if (
                filename.startswith(weights_no_suffix)
                and os.path.isfile(full_filename)
                and filename not in state_dict_split.filename_to_tensors.keys()
                and is_main_process
                and reg.fullmatch(filename_no_suffix) is not None
            ):
                os.remove(full_filename)
        # Save the model
        filename_to_tensors = state_dict_split.filename_to_tensors.items()
        if module_map:
            filename_to_tensors = logging.tqdm(filename_to_tensors, desc="Saving checkpoint shards")
        for shard_file, tensors in filename_to_tensors:
            shard = {tensor: state_dict[tensor] for tensor in tensors}
            # remake shard with onloaded parameters if necessary
            if module_map:
                # init state_dict for this shard
                shard_state_dict = {name: "" for name in shard}
                for module_name in shard:
                    module = module_map[module_name]
                    # update state dict with onloaded parameters
                    # shard_state_dict = get_state_dict_from_offload(module, module_name, shard_state_dict)

                # assign shard to be the completed state dict
                shard = shard_state_dict
                del shard_state_dict
                gc.collect()

            if safe_serialization:
                # At some point we will need to deal better with save_function (used for TPU and other distributed
                # joyfulness), but for now this enough.
                safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "np"})
            else:
                save_function(shard, os.path.join(save_directory, shard_file))

        if index is None:
            path_to_weights = os.path.join(save_directory, weights_name)
            logger.info(f"Model weights saved in {path_to_weights}")
        else:
            save_index_file = SAFE_WEIGHTS_INDEX_NAME if safe_serialization else WEIGHTS_INDEX_NAME
            save_index_file = os.path.join(save_directory, _add_variant(save_index_file, variant))
            # Save the index as well
            with open(save_index_file, "w", encoding="utf-8") as f:
                content = json.dumps(index, indent=2, sort_keys=True) + "\n"
                f.write(content)
            logger.info(
                f"The model is bigger than the maximum size per checkpoint ({max_shard_size}) and is going to be "
                f"split in {len(state_dict_split.filename_to_tensors)} checkpoint shards. You can find where each parameters has been saved in the "
                f"index located at {save_index_file}."
            )

    def get_memory_footprint(self, return_buffers=True):
        r"""
        Get the memory footprint of a model. This will return the memory footprint of the current model in bytes.
        Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the
        PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

        Arguments:
            return_buffers (`bool`, *optional*, defaults to `True`):
                Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers
                are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch
                norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2
        """
        mem = sum(param.nelement() * param.itemsize for param in self.parameters())
        if return_buffers:
            mem_bufs = sum(buf.nelement() * buf.itemsize for buf in self.buffers())
            mem = mem + mem_bufs
        return mem

    def half(self, *args):
        # Checks if the model is quantized
        if getattr(self, "is_quantized", False):
            raise ValueError(
                "`.half()` is not supported for quantized model. Please use the model as it is, since the"
                " model has already been casted to the correct `dtype`."
            )
        else:
            return super().half(*args)

    def float(self, *args):
        # Checks if the model is quantized
        if getattr(self, "is_quantized", False):
            raise ValueError(
                "`.float()` is not supported for quantized model. Please use the model as it is, since the"
                " model has already been casted to the correct `dtype`."
            )
        else:
            return super().float(*args)

    @classmethod
    def from_pretrained(
        cls,
        pretrained_model_name_or_path: Optional[Union[str, os.PathLike]],
        *model_args,
        config: Optional[Union[PretrainedConfig, str, os.PathLike]] = None,
        cache_dir: Optional[Union[str, os.PathLike]] = None,
        ignore_mismatched_sizes: bool = False,
        force_download: bool = False,
        local_files_only: bool = False,
        token: Optional[Union[str, bool]] = None,
        revision: str = "main",
        use_safetensors: bool = None,
        **kwargs,
    ) -> "PreTrainedModel":
        r"""
        Instantiate a pretrained pytorch model from a pre-trained model configuration.

        The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). To train
        the model, you should first set it back in training mode with `model.train()`.

        The warning *Weights from XXX not initialized from pretrained model* means that the weights of XXX do not come
        pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning
        task.

        The warning *Weights from XXX not used in YYY* means that the layer XXX is not used by YYY, therefore those
        weights are discarded.

        If model weights are the same precision as the base model (and is a supported model), weights will be lazily loaded
        in using the `meta` device and brought into memory once an input is passed through that layer regardless of
        `low_cpu_mem_usage`.

        Parameters:
            pretrained_model_name_or_path (`str` or `os.PathLike`, *optional*):
                Can be either:

                    - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
                    - A path to a *directory* containing model weights saved using
                      [`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
                    - A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
                      this case, `from_tf` should be set to `True` and a configuration object should be provided as
                      `config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
                      PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
                    - A path or url to a model folder containing a *flax checkpoint file* in *.msgpack* format (e.g,
                      `./flax_model/` containing `flax_model.msgpack`). In this case, `from_flax` should be set to
                      `True`.
                    - `None` if you are both providing the configuration and state dictionary (resp. with keyword
                      arguments `config` and `state_dict`).
            model_args (sequence of positional arguments, *optional*):
                All remaining positional arguments will be passed to the underlying model's `__init__` method.
            config (`Union[PretrainedConfig, str, os.PathLike]`, *optional*):
                Can be either:

                    - an instance of a class derived from [`PretrainedConfig`],
                    - a string or path valid as input to [`~PretrainedConfig.from_pretrained`].

                Configuration for the model to use instead of an automatically loaded configuration. Configuration can
                be automatically loaded when:

                    - The model is a model provided by the library (loaded with the *model id* string of a pretrained
                      model).
                    - The model was saved using [`~PreTrainedModel.save_pretrained`] and is reloaded by supplying the
                      save directory.
                    - The model is loaded by supplying a local directory as `pretrained_model_name_or_path` and a
                      configuration JSON file named *config.json* is found in the directory.
            state_dict (`Dict[str, mindspore.Tensor]`, *optional*):
                A state dictionary to use instead of a state dictionary loaded from saved weights file.

                This option can be used if you want to create a model from a pretrained configuration but load your own
                weights. In this case though, you should check if using [`~PreTrainedModel.save_pretrained`] and
                [`~PreTrainedModel.from_pretrained`] is not a simpler option.
            cache_dir (`Union[str, os.PathLike]`, *optional*):
                Path to a directory in which a downloaded pretrained model configuration should be cached if the
                standard cache should not be used.
            from_tf (`bool`, *optional*, defaults to `False`):
                Load the model weights from a TensorFlow checkpoint save file (see docstring of
                `pretrained_model_name_or_path` argument).
            from_flax (`bool`, *optional*, defaults to `False`):
                Load the model weights from a Flax checkpoint save file (see docstring of
                `pretrained_model_name_or_path` argument).
            ignore_mismatched_sizes (`bool`, *optional*, defaults to `False`):
                Whether or not to raise an error if some of the weights from the checkpoint do not have the same size
                as the weights of the model (if for instance, you are instantiating a model with 10 labels from a
                checkpoint with 3 labels).
            force_download (`bool`, *optional*, defaults to `False`):
                Whether or not to force the (re-)download of the model weights and configuration files, overriding the
                cached versions if they exist.
            resume_download:
                Deprecated and ignored. All downloads are now resumed by default when possible.
                Will be removed in v5 of Transformers.
            proxies (`Dict[str, str]`, *optional*):
                A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
                'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
            output_loading_info(`bool`, *optional*, defaults to `False`):
                Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
            local_files_only(`bool`, *optional*, defaults to `False`):
                Whether or not to only look at local files (i.e., do not try to download the model).
            token (`str` or `bool`, *optional*):
                The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
                the token generated when running `huggingface-cli login` (stored in `~/.huggingface`).
            revision (`str`, *optional*, defaults to `"main"`):
                The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
                git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
                identifier allowed by git.

                <Tip>

                To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>".

                </Tip>

            mirror (`str`, *optional*):
                Mirror source to accelerate downloads in China. If you are from China and have an accessibility
                problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety.
                Please refer to the mirror site for more information.
            _fast_init(`bool`, *optional*, defaults to `True`):
                Whether or not to disable fast initialization.

                <Tip warning={true}>

                One should only disable *_fast_init* to ensure backwards compatibility with `transformers.__version__ <
                4.6.0` for seeded model initialization. This argument will be removed at the next major version. See
                [pull request 11471](https://github.com/huggingface/transformers/pull/11471) for more information.

                </Tip>
            attn_implementation (`str`, *optional*):
                The attention implementation to use in the model (if relevant). The default is otherwise the manual `"eager"` implementation.

            > Parameters for big model inference

            low_cpu_mem_usage(`bool`, *optional*):
                Tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model.
                Generally should be combined with a `device_map` (such as `"auto"`) for best results.
                This is an experimental feature and a subject to change at any moment.
                </Tip>
                    If the model weights are in the same precision as the model loaded in, `low_cpu_mem_usage` (without
                    `device_map`) is redundant and will not provide any benefit in regards to CPU memory usage. However,
                    this should still be enabled if you are passing in a `device_map`.
                </Tip>
            ms_dtype (`str` or `mindspore.dtype.TensorType`, *optional*):
                Override the default `mindspore.dtype.TensorType` and load the model under a specific `dtype`. The different options
                are:

                1. `torch.float16` or `torch.bfloat16` or `torch.float`: load in a specified
                  `dtype`, ignoring the model's `config.ms_dtype` if one exists. If not specified
                  - the model will get loaded in `torch.float` (fp32).

                2. `"auto"` - A `ms_dtype` entry in the `config.json` file of the model will be
                  attempted to be used. If this entry isn't found then next check the `dtype` of the first weight in
                  the checkpoint that's of a floating point type and use that as `dtype`. This will load the model
                  using the `dtype` it was saved in at the end of the training. It can't be used as an indicator of how
                  the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.

                3. A string that is a valid `mindspore.dtype.TensorType`. E.g. "float32" loads the model in `mindspore.float32`, "float16" loads in `torch.float16` etc.

                <Tip>

                For some models the `dtype` they were trained in is unknown - you may try to check the model's paper or
                reach out to the authors and ask them to add this information to the model's card and to insert the
                `ms_dtype` entry in `config.json` on the hub.

                </Tip>

            device_map (`str` or `Dict[str, Union[int, str, torch.device]]` or `int` or `torch.device`, *optional*):
                A map that specifies where each submodule should go. It doesn't need to be refined to each
                parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the
                same device. If we only pass the device (*e.g.*, `"cpu"`, `"cuda:1"`, `"mps"`, or a GPU ordinal rank
                like `1`) on which the model will be allocated, the device map will map the entire model to this
                device. Passing `device_map = 0` means put the whole model on GPU 0.

                To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For
                more information about each option see [designing a device
                map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map).
            max_memory (`Dict`, *optional*):
                A dictionary device identifier to maximum memory. Will default to the maximum memory available for each
                GPU and the available CPU RAM if unset.
            offload_folder (`str` or `os.PathLike`, *optional*):
                If the `device_map` contains any value `"disk"`, the folder where we will offload weights.
            offload_state_dict (`bool`, *optional*):
                If `True`, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU
                RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to
                `True` when there is some disk offload.
            offload_buffers (`bool`, *optional*):
                Whether or not to offload the buffers with the model parameters.
            quantization_config (`Union[QuantizationConfigMixin,Dict]`, *optional*):
                A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g
                bitsandbytes, gptq). There may be other quantization-related kwargs, including `load_in_4bit` and
                `load_in_8bit`, which are parsed by QuantizationConfigParser. Supported only for bitsandbytes
                quantizations and not preferred. consider inserting all such arguments into quantization_config
                instead.
            subfolder (`str`, *optional*, defaults to `""`):
                In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
                specify the folder name here.
            variant (`str`, *optional*):
                If specified load weights from `variant` filename, *e.g.* pytorch_model.<variant>.bin. `variant` is
                ignored when using `from_tf` or `from_flax`.
            use_safetensors (`bool`, *optional*, defaults to `None`):
                Whether or not to use `safetensors` checkpoints. Defaults to `None`. If not specified and `safetensors`
                is not installed, it will be set to `False`.

            kwargs (remaining dictionary of keyword arguments, *optional*):
                Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
                `output_attentions=True`). Behaves differently depending on whether a `config` is provided or
                automatically loaded:

                    - If a configuration is provided with `config`, `**kwargs` will be directly passed to the
                      underlying model's `__init__` method (we assume all relevant updates to the configuration have
                      already been done)
                    - If a configuration is not provided, `kwargs` will be first passed to the configuration class
                      initialization function ([`~PretrainedConfig.from_pretrained`]). Each key of `kwargs` that
                      corresponds to a configuration attribute will be used to override said attribute with the
                      supplied `kwargs` value. Remaining keys that do not correspond to any configuration attribute
                      will be passed to the underlying model's `__init__` function.

        <Tip>

        Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to
        use this method in a firewalled environment.

        </Tip>

        Examples:

        ```python
        >>> from transformers import BertConfig, BertModel

        >>> # Download model and configuration from huggingface.co and cache.
        >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
        >>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
        >>> model = BertModel.from_pretrained("./test/saved_model/")
        >>> # Update configuration during loading.
        >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
        >>> assert model.config.output_attentions == True
        >>> # Loading from a TF checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
        >>> config = BertConfig.from_json_file("./tf_model/my_tf_model_config.json")
        >>> model = BertModel.from_pretrained("./tf_model/my_tf_checkpoint.ckpt.index", from_tf=True, config=config)
        >>> # Loading from a Flax checkpoint file instead of a PyTorch model (slower)
        >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)
        ```

        * `low_cpu_mem_usage` algorithm:

        This is an experimental function that loads the model using ~1x model size CPU memory

        Here is how it works:

        1. save which state_dict keys we have
        2. drop state_dict before the model is created, since the latter takes 1x model size CPU memory
        3. after the model has been instantiated switch to the meta device all params/buffers that
        are going to be replaced from the loaded state_dict
        4. load state_dict 2nd time
        5. replace the params/buffers from the state_dict

        Currently, it can't handle deepspeed ZeRO stage 3 and ignores loading errors

        """
        state_dict = kwargs.pop("state_dict", None)
        from_pt = kwargs.pop("from_pt", True)
        resume_download = kwargs.pop("resume_download", None)
        proxies = kwargs.pop("proxies", None)
        output_loading_info = kwargs.pop("output_loading_info", False)
        from_pipeline = kwargs.pop("_from_pipeline", None)
        from_auto_class = kwargs.pop("_from_auto", False)
        _fast_init = kwargs.pop("_fast_init", True)
        ms_dtype = kwargs.pop("ms_dtype", None)
        low_cpu_mem_usage = kwargs.pop("low_cpu_mem_usage", None)
        device_map = kwargs.pop("device_map", None)
        max_memory = kwargs.pop("max_memory", None)
        offload_folder = kwargs.pop("offload_folder", None)
        offload_state_dict = kwargs.pop("offload_state_dict", False)
        offload_buffers = kwargs.pop("offload_buffers", False)
        load_in_8bit = kwargs.pop("load_in_8bit", False)
        load_in_4bit = kwargs.pop("load_in_4bit", False)
        quantization_config = kwargs.pop("quantization_config", None)
        mirror = kwargs.pop('mirror', 'huggingface')
        subfolder = kwargs.pop("subfolder", "")
        commit_hash = kwargs.pop("_commit_hash", None)
        variant = kwargs.pop("variant", None)
        adapter_kwargs = kwargs.pop("adapter_kwargs", {})
        adapter_name = kwargs.pop("adapter_name", "default")
        use_flash_attention_2 = kwargs.pop("use_flash_attention_2", False)

        gguf_file = kwargs.pop("gguf_file", None)

        if token is not None and adapter_kwargs is not None and "token" not in adapter_kwargs:
            adapter_kwargs["token"] = token

        if use_safetensors is None and not is_safetensors_available():
            use_safetensors = False

        if gguf_file is not None:
            raise ValueError("MindNLP not support GGUF file.")

        if commit_hash is None:
            if not isinstance(config, PretrainedConfig):
                # We make a call to the config file first (which may be absent) to get the commit hash as soon as possible
                resolved_config_file = cached_file(
                    pretrained_model_name_or_path,
                    CONFIG_NAME,
                    cache_dir=cache_dir,
                    force_download=force_download,
                    resume_download=resume_download,
                    proxies=proxies,
                    local_files_only=local_files_only,
                    token=token,
                    revision=revision,
                    subfolder=subfolder,
                    _raise_exceptions_for_gated_repo=False,
                    _raise_exceptions_for_missing_entries=False,
                    _raise_exceptions_for_connection_errors=False,
                )
                commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
            else:
                commit_hash = getattr(config, "_commit_hash", None)

        _adapter_model_path = adapter_kwargs.pop("_adapter_model_path", None)

        if _adapter_model_path is None:
            _adapter_model_path = find_adapter_config_file(
                pretrained_model_name_or_path,
                cache_dir=cache_dir,
                force_download=force_download,
                resume_download=resume_download,
                proxies=proxies,
                local_files_only=local_files_only,
                _commit_hash=commit_hash,
                mirror=mirror,
                **adapter_kwargs,
            )
        if _adapter_model_path is not None and os.path.isfile(_adapter_model_path):
            with open(_adapter_model_path, "r", encoding="utf-8") as f:
                _adapter_model_path = pretrained_model_name_or_path
                pretrained_model_name_or_path = json.load(f)["base_model_name_or_path"]

        # change device_map into a map if we passed an int, a str or a torch.device
        if isinstance(device_map, str) and device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]:
            raise ValueError(
                "When passing device_map as a string, the value needs to be "
                f"'auto', 'balanced', 'balanced_low_0', 'sequential' but found {device_map}."
            )
        elif isinstance(device_map, int):
            if device_map < 0:
                raise ValueError(
                    "You can't pass device_map as a negative int. If you want to put the model on the cpu, pass device_map = 'cpu' "
                )
            else:
                device_map = {"": device_map}

        if device_map is not None:
            if low_cpu_mem_usage is None:
                low_cpu_mem_usage = True
            elif not low_cpu_mem_usage:
                raise ValueError("Passing along a `device_map` requires `low_cpu_mem_usage=True`")

        # handling bnb config from kwargs, remove after `load_in_{4/8}bit` deprecation.
        if load_in_4bit or load_in_8bit:
            logger.warning("MindNLP do not support 4bit or 8bit for now.")

        user_agent = {"file_type": "model", "framework": "pytorch", "from_auto_class": from_auto_class}
        if from_pipeline is not None:
            user_agent["using_pipeline"] = from_pipeline

        if is_offline_mode() and not local_files_only:
            logger.info("Offline mode: forcing local_files_only=True")
            local_files_only = True

        # Load config if we don't provide a configuration
        if not isinstance(config, PretrainedConfig):
            config_path = config if config is not None else pretrained_model_name_or_path
            config, model_kwargs = cls.config_class.from_pretrained(
                config_path,
                cache_dir=cache_dir,
                return_unused_kwargs=True,
                force_download=force_download,
                resume_download=resume_download,
                proxies=proxies,
                local_files_only=local_files_only,
                token=token,
                revision=revision,
                subfolder=subfolder,
                mirror=mirror,
                _from_auto=from_auto_class,
                _from_pipeline=from_pipeline,
                **kwargs,
            )
        else:
            # In case one passes a config to `from_pretrained` + "attn_implementation"
            # override the `_attn_implementation` attribute to `attn_implementation` of the kwargs
            # Please see: https://github.com/huggingface/transformers/issues/28038

            # Overwrite `config._attn_implementation` by the one from the kwargs --> in auto-factory
            # we pop attn_implementation from the kwargs but this handles the case where users
            # passes manually the config to `from_pretrained`.
            config = copy.deepcopy(config)

            kwarg_attn_imp = kwargs.pop("attn_implementation", None)
            if kwarg_attn_imp is not None:
                config._attn_implementation = kwarg_attn_imp

            model_kwargs = kwargs

        # This variable will flag if we're loading a sharded checkpoint. In this case the archive file is just the
        # index of the files.
        is_sharded = False
        sharded_metadata = None
        # Load model
        loading_info = None

        # Keep in fp32 modules
        keep_in_fp32_modules = None
        use_keep_in_fp32_modules = False

        if pretrained_model_name_or_path is not None and gguf_file is None:
            pretrained_model_name_or_path = str(pretrained_model_name_or_path)
            is_local = os.path.isdir(pretrained_model_name_or_path)
            if is_local:
                if use_safetensors is not False and os.path.isfile(
                    os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_NAME, variant))
                ):
                    # Load from a safetensors checkpoint
                    archive_file = os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_NAME, variant)
                    )
                elif use_safetensors is not False and os.path.isfile(
                    os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_INDEX_NAME, variant)
                    )
                ):
                    # Load from a sharded safetensors checkpoint
                    archive_file = os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_INDEX_NAME, variant)
                    )
                    is_sharded = True
                elif not use_safetensors and os.path.isfile(
                    os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_NAME, variant))
                ):
                    # Load from a MindSpore checkpoint
                    archive_file = os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_NAME, variant)
                    )
                elif not use_safetensors and os.path.isfile(
                    os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_INDEX_NAME, variant))
                ):
                    # Load from a sharded MindSpore checkpoint
                    archive_file = os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_INDEX_NAME, variant)
                    )
                    is_sharded = True
                elif not use_safetensors and os.path.isfile(
                    os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_NAME, variant))
                ):
                    # Load from a PyTorch checkpoint
                    archive_file = os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_NAME, variant)
                    )
                elif not use_safetensors and os.path.isfile(
                    os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_INDEX_NAME, variant))
                ):
                    # Load from a sharded PyTorch checkpoint
                    archive_file = os.path.join(
                        pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_INDEX_NAME, variant)
                    )
                    is_sharded = True
                elif use_safetensors:
                    raise EnvironmentError(
                        f"Error no file named {_add_variant(SAFE_WEIGHTS_NAME, variant)} found in directory"
                        f" {pretrained_model_name_or_path}."
                    )
                else:
                    raise EnvironmentError(
                        f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(PT_WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
                        f" found in directory {pretrained_model_name_or_path}."
                    )
            elif os.path.isfile(os.path.join(subfolder, pretrained_model_name_or_path)):
                archive_file = pretrained_model_name_or_path
                is_local = True
            elif is_remote_url(pretrained_model_name_or_path):
                filename = pretrained_model_name_or_path
                resolved_archive_file = download_url(pretrained_model_name_or_path)
            else:
                # set correct filename
                if use_safetensors is not False:
                    filename = _add_variant(SAFE_WEIGHTS_NAME, variant)
                elif from_pt:
                    filename = _add_variant(PT_WEIGHTS_NAME, variant)
                else:
                    filename = _add_variant(WEIGHTS_NAME, variant)

                try:
                    # Load from URL or cache if already cached
                    cached_file_kwargs = {
                        "cache_dir": cache_dir,
                        "force_download": force_download,
                        "proxies": proxies,
                        "resume_download": resume_download,
                        "local_files_only": local_files_only,
                        "token": token,
                        "user_agent": user_agent,
                        "revision": revision,
                        "subfolder": subfolder,
                        "_raise_exceptions_for_gated_repo": False,
                        "_raise_exceptions_for_missing_entries": False,
                        # "_commit_hash": commit_hash,
                        "mirror": mirror
                    }
                    resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)

                    # Since we set _raise_exceptions_for_missing_entries=False, we don't get an exception but a None
                    # result when internet is up, the repo and revision exist, but the file does not.
                    if resolved_archive_file is None and filename == _add_variant(SAFE_WEIGHTS_NAME, variant):
                        # Maybe the checkpoint is sharded, we try to grab the index name in this case.
                        resolved_archive_file = cached_file(
                            pretrained_model_name_or_path,
                            _add_variant(SAFE_WEIGHTS_INDEX_NAME, variant),
                            **cached_file_kwargs,
                        )
                        if resolved_archive_file is not None:
                            is_sharded = True
                        elif use_safetensors:
                            if revision == "main":
                                resolved_archive_file, revision, is_sharded = auto_conversion(
                                    pretrained_model_name_or_path, **cached_file_kwargs
                                )
                            cached_file_kwargs["revision"] = revision
                            if resolved_archive_file is None:
                                raise EnvironmentError(
                                    f"{pretrained_model_name_or_path} does not appear to have a file named"
                                    f" {_add_variant(SAFE_WEIGHTS_NAME, variant)} or {_add_variant(SAFE_WEIGHTS_INDEX_NAME, variant)} "
                                    "and thus cannot be loaded with `safetensors`. Please make sure that the model has "
                                    "been saved with `safe_serialization=True` or do not set `use_safetensors=True`."
                                )
                        elif from_pt:
                            filename = _add_variant(PT_WEIGHTS_NAME, variant)
                            resolved_archive_file = cached_file(
                                pretrained_model_name_or_path, filename, **cached_file_kwargs
                            )
                        else:
                            # This repo has no safetensors file of any kind, we switch to PyTorch.
                            filename = _add_variant(WEIGHTS_NAME, variant)
                            resolved_archive_file = cached_file(
                                pretrained_model_name_or_path, filename, **cached_file_kwargs
                            )

                    if resolved_archive_file is None and filename == _add_variant(WEIGHTS_NAME, variant):
                        # Maybe the checkpoint is sharded, we try to grab the index name in this case.
                        resolved_archive_file = cached_file(
                            pretrained_model_name_or_path,
                            _add_variant(WEIGHTS_INDEX_NAME, variant),
                            **cached_file_kwargs,
                        )
                        if resolved_archive_file is not None:
                            is_sharded = True

                    if resolved_archive_file is None and filename == _add_variant(PT_WEIGHTS_NAME, variant):
                        # Maybe the checkpoint is sharded, we try to grab the index name in this case.
                        resolved_archive_file = cached_file(
                            pretrained_model_name_or_path,
                            _add_variant(PT_WEIGHTS_INDEX_NAME, variant),
                            **cached_file_kwargs,
                        )
                        if resolved_archive_file is not None:
                            is_sharded = True

                    if not local_files_only and not is_offline_mode():
                        if resolved_archive_file is not None:
                            if filename in [WEIGHTS_NAME, WEIGHTS_INDEX_NAME, PT_WEIGHTS_NAME, PT_WEIGHTS_INDEX_NAME]:
                                # If the PyTorch file was found, check if there is a safetensors file on the repository
                                # If there is no safetensors file on the repositories, start an auto conversion
                                safe_weights_name = SAFE_WEIGHTS_INDEX_NAME if is_sharded else SAFE_WEIGHTS_NAME
                                has_file_kwargs = {
                                    "revision": revision,
                                    "proxies": proxies,
                                    "token": token,
                                    "cache_dir": cache_dir,
                                    "local_files_only": local_files_only,
                                    "mirror": mirror,
                                }
                                cached_file_kwargs = {
                                    "cache_dir": cache_dir,
                                    "force_download": force_download,
                                    "resume_download": resume_download,
                                    "local_files_only": local_files_only,
                                    "user_agent": user_agent,
                                    "subfolder": subfolder,
                                    "_raise_exceptions_for_gated_repo": False,
                                    "_raise_exceptions_for_missing_entries": False,
                                    # "_commit_hash": commit_hash,
                                    **has_file_kwargs,
                                }

                                if not has_file(pretrained_model_name_or_path, safe_weights_name, **has_file_kwargs):
                                    Thread(
                                        target=auto_conversion,
                                        args=(pretrained_model_name_or_path,),
                                        kwargs={"ignore_errors_during_conversion": True, **cached_file_kwargs},
                                        name="Thread-autoconversion",
                                    ).start()
                        else:
                            # Otherwise, no PyTorch file was found, maybe there is a TF or Flax model file.
                            # We try those to give a helpful error message.
                            has_file_kwargs = {
                                "mirror": mirror,
                                "revision": revision,
                                "proxies": proxies,
                                "token": token,
                                "cache_dir": cache_dir,
                                "local_files_only": local_files_only,
                            }
                            if variant is not None and has_file(
                                pretrained_model_name_or_path, WEIGHTS_NAME, **has_file_kwargs
                            ):
                                raise EnvironmentError(
                                    f"{pretrained_model_name_or_path} does not appear to have a file named"
                                    f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file without the variant"
                                    f" {variant}. Use `variant=None` to load this model from those weights."
                                )
                            else:
                                raise EnvironmentError(
                                    f"{pretrained_model_name_or_path} does not appear to have a file named"
                                    f" {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)}."
                                )

                except EnvironmentError:
                    # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted
                    # to the original exception.
                    raise
                except Exception as e:
                    # For any other exception, we throw a generic error.
                    raise EnvironmentError(
                        f"Can't load the model for '{pretrained_model_name_or_path}'. If you were trying to load it"
                        " from 'https://huggingface.co/models', make sure you don't have a local directory with the"
                        f" same name. Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a"
                        f" directory containing a file named {_add_variant(WEIGHTS_NAME, variant)}."
                    ) from e

            if is_local:
                logger.info(f"loading weights file {archive_file}")
                resolved_archive_file = archive_file
            else:
                logger.info(f"loading weights file {filename} from cache at {resolved_archive_file}")

        else:
            resolved_archive_file = None

        # We'll need to download and cache each checkpoint shard if the checkpoint is sharded.
        if is_sharded:
            # resolved_archive_file becomes a list of files that point to the different checkpoint shards in this case.
            resolved_archive_file, sharded_metadata = get_checkpoint_shard_files(
                pretrained_model_name_or_path,
                resolved_archive_file,
                cache_dir=cache_dir,
                force_download=force_download,
                proxies=proxies,
                resume_download=resume_download,
                local_files_only=local_files_only,
                token=token,
                user_agent=user_agent,
                revision=revision,
                subfolder=subfolder,
                # _commit_hash=commit_hash,
                mirror=mirror
            )

        if (
            is_safetensors_available()
            and isinstance(resolved_archive_file, str)
            and resolved_archive_file.endswith(".safetensors")
        ):
            with safe_open(resolved_archive_file, framework="np") as f:
                metadata = f.metadata()

            if metadata.get("format") in ['pt', 'tf', 'flax', 'mlx', 'np']:
                pass
            else:
                raise ValueError(
                    f"Incompatible safetensors file. File metadata is not ['np', 'pt', 'tf', 'flax', 'mlx'] but {metadata.get('format')}"
                )

        # load pt weights early so that we know which dtype to init the model under
        if from_pt:
            if not is_sharded and state_dict is None:
                # Time to load the checkpoint
                state_dict = load_state_dict(resolved_archive_file)

            # set dtype to instantiate the model under:
            # 1. If ms_dtype is not None, we use that dtype
            # 2. If ms_dtype is "auto", we auto-detect dtype from the loaded state_dict, by checking its first
            #    weights entry that is of a floating type - we assume all floating dtype weights are of the same dtype
            # we also may have config.ms_dtype available, but we won't rely on it till v5
            dtype_orig = None
            if ms_dtype is not None:
                if isinstance(ms_dtype, str):
                    if ms_dtype == "auto":
                        if hasattr(config, "ms_dtype") and config.ms_dtype is not None:
                            ms_dtype = config.ms_dtype
                            logger.info(f"Will use ms_dtype={ms_dtype} as defined in model's config object")
                        else:
                            if is_sharded and "dtype" in sharded_metadata:
                                ms_dtype = sharded_metadata["dtype"]
                            elif not is_sharded:
                                ms_dtype = get_state_dict_dtype(state_dict)
                            else:
                                one_state_dict = load_state_dict(resolved_archive_file[0])
                                ms_dtype = get_state_dict_dtype(one_state_dict)
                                del one_state_dict  # free CPU memory
                            logger.info(
                                "Since the `ms_dtype` attribute can't be found in model's config object, "
                                "will use ms_dtype={ms_dtype} as derived from model's weights"
                            )
                    elif hasattr(mindspore, ms_dtype):
                        ms_dtype = getattr(mindspore, ms_dtype)
                    else:
                        raise ValueError(
                            f'`ms_dtype` can be one of: `mindspore.dtype.TensorType`, `"auto"` or a string of a valid `mindspore.dtype.TensorType`, but received {ms_dtype}'
                        )
                dtype_orig = cls._set_default_ms_dtype(ms_dtype)

            # Check if `_keep_in_fp32_modules` is not None
            use_keep_in_fp32_modules = (cls._keep_in_fp32_modules is not None) and (ms_dtype == mindspore.float16)

            if is_sharded:
                loaded_state_dict_keys = sharded_metadata["all_checkpoint_keys"]
            else:
                loaded_state_dict_keys = list(state_dict.keys())

        config.name_or_path = pretrained_model_name_or_path

        # Instantiate model.
        init_contexts = [no_init_weights(_enable=_fast_init)]

        config = copy.deepcopy(config)  # We do not want to modify the config inplace in from_pretrained.
        config = cls._autoset_attn_implementation(
            config, use_flash_attention_2=use_flash_attention_2, ms_dtype=ms_dtype, device_map=device_map
        )

        model_kwargs.pop('mirror', None)
        with ContextManagers(init_contexts):
            # Let's make sure we don't run the init function of buffer modules
            model = cls(config, *model_args, **model_kwargs)
        # make sure we use the model's config since the __init__ call might have copied it
        config = model.config

        # Check first if we are `from_pt`
        if use_keep_in_fp32_modules:
            keep_in_fp32_modules = model._keep_in_fp32_modules
        else:
            keep_in_fp32_modules = []

        try:
            group_size = get_group_size()
        except:
            group_size = 1

        if isinstance(device_map, str):
            special_dtypes = {}

            special_dtypes.update(
                {
                    name: mindspore.float32
                    for name, _ in model.named_parameters()
                    if any(m in name for m in keep_in_fp32_modules)
                }
            )

            target_dtype = ms_dtype

            no_split_modules = model._get_no_split_modules(device_map)
            if device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]:
                raise ValueError(
                    "If passing a string for `device_map`, please choose 'auto', 'balanced', 'balanced_low_0' or "
                    "'sequential'."
                )

            device_map_kwargs = {"no_split_module_classes": no_split_modules}
            if "special_dtypes" in inspect.signature(infer_auto_device_map).parameters:
                device_map_kwargs["special_dtypes"] = special_dtypes
            elif len(special_dtypes) > 0:
                logger.warning(
                    "This model has some weights that should be kept in higher precision, you need to upgrade "
                    "`accelerate` to properly deal with them (`pip install --upgrade accelerate`)."
                )
            if device_map != "sequential":
                max_memory = get_balanced_memory(
                    model,
                    dtype=target_dtype,
                    low_zero=(device_map == "balanced_low_0"),
                    max_memory=max_memory,
                    **device_map_kwargs,
                )
            else:
                max_memory = get_max_memory(max_memory)

            device_map_kwargs["max_memory"] = max_memory

            # Make sure tied weights are tied before creating the device map.
            model.tie_weights()
            device_map = infer_auto_device_map(model, dtype=target_dtype, **device_map_kwargs)

        elif device_map is not None:
            model.tie_weights()
            tied_params = find_tied_parameters(model)
            # check if we don't have tied param in different devices
            check_tied_parameters_on_same_device(tied_params, device_map)

        # replace unnessasery modules to nn.Identity and insert send/recv for pipeline inference.
        if device_map is not None and group_size != 1:
            model = modify_model_for_pp_infer(model, device_map, model._no_split_modules)

        if from_pt:
            # restore default dtype
            if dtype_orig is not None:
                set_default_dtype(dtype_orig)

            (
                model,
                missing_keys,
                unexpected_keys,
                mismatched_keys,
                offload_index,
                error_msgs,
            ) = cls._load_pretrained_model(
                model,
                state_dict,
                loaded_state_dict_keys,
                resolved_archive_file,
                pretrained_model_name_or_path,
                ignore_mismatched_sizes=ignore_mismatched_sizes,
                sharded_metadata=sharded_metadata,
                _fast_init=_fast_init,
                low_cpu_mem_usage=low_cpu_mem_usage,
                device_map=device_map,
                offload_folder=offload_folder,
                offload_state_dict=offload_state_dict,
                dtype=ms_dtype,
                keep_in_fp32_modules=keep_in_fp32_modules,
            )

        # make sure token embedding weights are still tied if needed
        model.tie_weights()

        # Set model in evaluation mode to deactivate DropOut modules by default
        model.eval()

        # If it is a model with generation capabilities, attempt to load the generation config
        if model.can_generate() and pretrained_model_name_or_path is not None:
            try:
                model.generation_config = GenerationConfig.from_pretrained(
                    pretrained_model_name_or_path,
                    cache_dir=cache_dir,
                    force_download=force_download,
                    resume_download=resume_download,
                    proxies=proxies,
                    local_files_only=local_files_only,
                    token=token,
                    revision=revision,
                    subfolder=subfolder,
                    mirror=mirror,
                    _from_auto=from_auto_class,
                    _from_pipeline=from_pipeline,
                    **kwargs,
                )
            except OSError:
                logger.info(
                    "Generation config file not found, using a generation config created from the model config."
                )

        if _adapter_model_path is not None:
            model.load_adapter(
                _adapter_model_path,
                adapter_name=adapter_name,
                token=token,
                adapter_kwargs=adapter_kwargs,
            )

        if output_loading_info:
            if loading_info is None:
                loading_info = {
                    "missing_keys": missing_keys,
                    "unexpected_keys": unexpected_keys,
                    "mismatched_keys": mismatched_keys,
                    "error_msgs": error_msgs,
                }
            return model, loading_info

        return model

    @classmethod
    def _load_pretrained_model(
        cls,
        model,
        state_dict,
        loaded_keys,
        resolved_archive_file,
        pretrained_model_name_or_path,
        ignore_mismatched_sizes=False,
        sharded_metadata=None,
        _fast_init=True,
        low_cpu_mem_usage=False,
        device_map=None,
        offload_folder=None,
        offload_state_dict=None,
        dtype=None,
        hf_quantizer=None,
        keep_in_fp32_modules=None,
        gguf_path=None,
    ):
        is_safetensors = False
        is_quantized = hf_quantizer is not None
        state_dict_folder = None
        state_dict_index = None

        if device_map is not None and "disk" in device_map.values():
            archive_file = (
                resolved_archive_file[0] if isinstance(resolved_archive_file, (list, tuple)) else resolved_archive_file
            )
            is_safetensors = archive_file.endswith(".safetensors")
            if offload_folder is None and not is_safetensors:
                raise ValueError(
                    "The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder`"
                    " for them. Alternatively, make sure you have `safetensors` installed if the model you are using"
                    " offers the weights in this format."
                )
            if offload_folder is not None:
                os.makedirs(offload_folder, exist_ok=True)
            if offload_state_dict is None:
                offload_state_dict = True

        is_sharded_safetensors = is_safetensors and sharded_metadata is not None

        # tie the model weights before retrieving the state_dict
        model.tie_weights()

        # Retrieve missing & unexpected_keys
        model_state_dict = model.state_dict()
        expected_keys = list(model_state_dict.keys())
        prefix = model.base_model_prefix

        def _fix_key(key):
            if "beta" in key:
                return key.replace("beta", "bias")
            if "gamma" in key:
                return key.replace("gamma", "weight")
            return key

        original_loaded_keys = loaded_keys
        loaded_keys = [_fix_key(key) for key in loaded_keys]

        if len(prefix) > 0:
            has_prefix_module = any(s.startswith(prefix) for s in loaded_keys)
            expects_prefix_module = any(s.startswith(prefix) for s in expected_keys)
        else:
            has_prefix_module = False
            expects_prefix_module = False

        # key re-naming operations are never done on the keys
        # that are loaded, but always on the keys of the newly initialized model
        remove_prefix_from_model = not has_prefix_module and expects_prefix_module
        add_prefix_to_model = has_prefix_module and not expects_prefix_module

        if remove_prefix_from_model:
            _prefix = f"{prefix}."
            expected_keys_not_prefixed = [s for s in expected_keys if not s.startswith(_prefix)]
            expected_keys = [s[len(_prefix) :] if s.startswith(_prefix) else s for s in expected_keys]
        elif add_prefix_to_model:
            expected_keys = [".".join([prefix, s]) for s in expected_keys]

        missing_keys = sorted(set(expected_keys) - set(loaded_keys))
        unexpected_keys = set(loaded_keys) - set(expected_keys)

        # Remove nonpersistent buffers from unexpected keys: they are not in the state dict but will be in the model
        # buffers
        model_buffers = {n for n, _ in model.named_buffers()}
        if remove_prefix_from_model:
            model_buffers = {key[len(_prefix) :] if key.startswith(_prefix) else key for key in model_buffers}
        elif add_prefix_to_model:
            model_buffers = {".".join([prefix, key]) for key in model_buffers}
        unexpected_keys = sorted(unexpected_keys - model_buffers)

        model.tie_weights()
        if device_map is None:
            ptrs = collections.defaultdict(list)
            for name, tensor in model.state_dict().items():
                id_tensor = id(tensor)
                ptrs[id_tensor].append(name)

            # These are all the pointers of shared tensors.
            tied_params = [names for _, names in ptrs.items() if len(names) > 1]
        else:
            # id function doesn't work for meta tensor so we need this function
            tied_params = find_tied_parameters(model)

        for group in tied_params:
            if remove_prefix_from_model:
                group = [key[len(_prefix) :] if key.startswith(_prefix) else key for key in group]
            elif add_prefix_to_model:
                group = [".".join([prefix, key]) for key in group]
            missing_in_group = [k for k in missing_keys if k in group]
            if len(missing_in_group) > 0 and len(missing_in_group) < len(group):
                missing_keys = [k for k in missing_keys if k not in missing_in_group]

        # Some models may have keys that are not in the state by design, removing them before needlessly warning
        # the user.
        if cls._keys_to_ignore_on_load_missing is not None:
            for pat in cls._keys_to_ignore_on_load_missing:
                missing_keys = [k for k in missing_keys if re.search(pat, k) is None]

        if cls._keys_to_ignore_on_load_unexpected is not None:
            for pat in cls._keys_to_ignore_on_load_unexpected:
                unexpected_keys = [k for k in unexpected_keys if re.search(pat, k) is None]

        # retrieve uninitialized modules and initialize before maybe overriding that with the pretrained weights.
        if _fast_init:
            if not ignore_mismatched_sizes:
                if remove_prefix_from_model:
                    _loaded_keys = [f"{prefix}.{k}" for k in loaded_keys]
                elif add_prefix_to_model:
                    _loaded_keys = [k[len(prefix) + 1 :] for k in loaded_keys]
                else:
                    _loaded_keys = loaded_keys
                not_initialized_submodules = set_initialized_submodules(model, _loaded_keys)
                # If we're about to tie the output embeds to the input embeds we don't need to init them
                if hasattr(model.config, "tie_word_embeddings") and model.config.tie_word_embeddings:
                    output_embeddings = model.get_output_embeddings()
                    if output_embeddings is not None:
                        # Still need to initialize if there is a bias term since biases are not tied.
                        if not hasattr(output_embeddings, "bias") or output_embeddings.bias is None:
                            output_embeddings._is_initialized = True
            else:
                not_initialized_submodules = dict(model.named_modules())
            # This will only initialize submodules that are not marked as initialized by the line above.
            model.apply(model._initialize_weights)

        # Set some modules to fp32 if any
        if keep_in_fp32_modules is not None:
            for name, param in model.named_parameters():
                if any(module_to_keep_in_fp32 in name.split(".") for module_to_keep_in_fp32 in keep_in_fp32_modules):
                    # param = param.to(mindspore.float32) does not work here as only in the local scope.
                    # param.data = param.data.to(mindspore.float32)
                    param.set_dtype(mindspore.float32)

        # Make sure we are able to load base models as well as derived models (with heads)
        start_prefix = ""
        model_to_load = model
        if len(cls.base_model_prefix) > 0 and not hasattr(model, cls.base_model_prefix) and has_prefix_module:
            start_prefix = cls.base_model_prefix + "."
        if len(cls.base_model_prefix) > 0 and hasattr(model, cls.base_model_prefix) and not has_prefix_module:
            model_to_load = getattr(model, cls.base_model_prefix)
            base_model_expected_keys = list(model_to_load.state_dict().keys())
            if any(key in expected_keys_not_prefixed and key not in base_model_expected_keys for key in loaded_keys):
                raise ValueError(
                    "The state dictionary of the model you are trying to load is corrupted. Are you sure it was "
                    "properly saved?"
                )
            if device_map is not None:
                device_map = {k.replace(f"{cls.base_model_prefix}.", ""): v for k, v in device_map.items()}

        def _find_mismatched_keys(
            state_dict,
            model_state_dict,
            loaded_keys,
            add_prefix_to_model,
            remove_prefix_from_model,
            ignore_mismatched_sizes,
        ):
            mismatched_keys = []
            if ignore_mismatched_sizes:
                for checkpoint_key in loaded_keys:
                    # If the checkpoint is sharded, we may not have the key here.
                    if checkpoint_key not in state_dict:
                        continue
                    model_key = checkpoint_key
                    if remove_prefix_from_model:
                        # The model key starts with `prefix` but `checkpoint_key` doesn't so we add it.
                        model_key = f"{prefix}.{checkpoint_key}"
                    elif add_prefix_to_model:
                        # The model key doesn't start with `prefix` but `checkpoint_key` does so we remove it.
                        model_key = ".".join(checkpoint_key.split(".")[1:])

                    if (
                        model_key in model_state_dict
                        and state_dict[checkpoint_key].shape != model_state_dict[model_key].shape
                    ):
                        if (
                            state_dict[checkpoint_key].shape[-1] == 1
                            and state_dict[checkpoint_key].numel() * 2 == model_state_dict[model_key].numel()
                        ):
                            # This skips size mismatches for 4-bit weights. Two 4-bit values share an 8-bit container, causing size differences.
                            # Without matching with module type or paramter type it seems like a practical way to detect valid 4bit weights.
                            pass
                        else:
                            mismatched_keys.append(
                                (checkpoint_key, state_dict[checkpoint_key].shape, model_state_dict[model_key].shape)
                            )
                            del state_dict[checkpoint_key]
            return mismatched_keys

        if resolved_archive_file is not None:
            folder = os.path.sep.join(resolved_archive_file[0].split(os.path.sep)[:-1])
        else:
            folder = None
        if device_map is not None and is_safetensors:
            raise ValueError("MindNLP do not support device_map for now.")
            # param_device_map = expand_device_map(device_map, original_loaded_keys, start_prefix)
            # str_dtype = str(dtype).replace("torch.", "") if dtype is not None else "float32"
            # if sharded_metadata is None:
            #     archive_file = (
            #         resolved_archive_file[0]
            #         if isinstance(resolved_archive_file, (list, tuple))
            #         else resolved_archive_file
            #     )
            #     weight_map = {p: archive_file for p in original_loaded_keys}
            # else:
            #     weight_map = {p: os.path.join(folder, f) for p, f in sharded_metadata["weight_map"].items()}
            # offload_index = {
            #     p[len(start_prefix) :]: {"safetensors_file": f, "weight_name": p, "dtype": str_dtype}
            #     for p, f in weight_map.items()
            #     if p.startswith(start_prefix) and param_device_map[p[len(start_prefix) :]] == "disk"
            # }
        else:
            offload_index = None

        if state_dict is not None:
            # Whole checkpoint
            mismatched_keys = _find_mismatched_keys(
                state_dict,
                model_state_dict,
                original_loaded_keys,
                add_prefix_to_model,
                remove_prefix_from_model,
                ignore_mismatched_sizes,
            )

            # For GGUF models `state_dict` is never set to None as the state dict is always small
            if gguf_path:
                raise ValueError('MindNLP do not support gguf for now')
            else:
                # Sharded checkpoint or whole but low_cpu_mem_usage==True
                assign_to_params_buffers = check_support_param_buffer_assignment(
                    model_to_load, state_dict, start_prefix
                )
                error_msgs = _load_state_dict_into_model(
                    model_to_load, state_dict, start_prefix, assign_to_params_buffers
                )

        else:
            # This should always be a list but, just to be sure.
            if not isinstance(resolved_archive_file, list):
                resolved_archive_file = [resolved_archive_file]

            error_msgs = []
            mismatched_keys = []
            if not is_safetensors:
                offload_index = {} if device_map is not None and "disk" in device_map.values() else None
            if offload_state_dict:
                state_dict_folder = tempfile.mkdtemp()
                state_dict_index = {}
            else:
                state_dict_folder = None
                state_dict_index = None

            if is_sharded_safetensors:
                disk_only_shard_files = get_disk_only_shard_files(
                    device_map, sharded_metadata=sharded_metadata, start_prefix=start_prefix
                )
                disk_only_shard_files = [os.path.join(folder, f) for f in disk_only_shard_files]
            else:
                disk_only_shard_files = []

            if len(resolved_archive_file) > 1:
                resolved_archive_file = logging.tqdm(resolved_archive_file, desc="Loading checkpoint shards")
            assign_to_params_buffers = None
            for shard_file in resolved_archive_file:
                # Skip the load for shards that only contain disk-offloaded weights when using safetensors for the offload.
                if shard_file in disk_only_shard_files:
                    continue
                state_dict = load_state_dict(shard_file, is_quantized=is_quantized)

                # Mistmatched keys contains tuples key/shape1/shape2 of weights in the checkpoint that have a shape not
                # matching the weights in the model.
                mismatched_keys += _find_mismatched_keys(
                    state_dict,
                    model_state_dict,
                    original_loaded_keys,
                    add_prefix_to_model,
                    remove_prefix_from_model,
                    ignore_mismatched_sizes,
                )
                if low_cpu_mem_usage:
                    logger.warning('low_cpu_mem usage is not avaliable.')

                # Sharded checkpoint or whole but low_cpu_mem_usage==True
                if assign_to_params_buffers is None:
                    assign_to_params_buffers = check_support_param_buffer_assignment(
                        model_to_load, state_dict, start_prefix
                    )
                error_msgs += _load_state_dict_into_model(
                    model_to_load, state_dict, start_prefix, assign_to_params_buffers
                )

                # force memory release
                del state_dict
                gc.collect()

        if len(error_msgs) > 0:
            error_msg = "\n\t".join(error_msgs)
            if "size mismatch" in error_msg:
                error_msg += (
                    "\n\tYou may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method."
                )
            raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")

        if len(unexpected_keys) > 0:
            archs = [] if model.config.architectures is None else model.config.architectures
            warner = logger.warning if model.__class__.__name__ in archs else logger.info
            warner(
                f"Some weights of the model checkpoint at {pretrained_model_name_or_path} were not used when"
                f" initializing {model.__class__.__name__}: {unexpected_keys}\n- This IS expected if you are"
                f" initializing {model.__class__.__name__} from the checkpoint of a model trained on another task or"
                " with another architecture (e.g. initializing a BertForSequenceClassification model from a"
                " BertForPreTraining model).\n- This IS NOT expected if you are initializing"
                f" {model.__class__.__name__} from the checkpoint of a model that you expect to be exactly identical"
                " (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."
            )
        else:
            logger.info(f"All model checkpoint weights were used when initializing {model.__class__.__name__}.\n")
        if len(missing_keys) > 0:
            logger.warning(
                f"Some weights of {model.__class__.__name__} were not initialized from the model checkpoint at"
                f" {pretrained_model_name_or_path} and are newly initialized: {missing_keys}\nYou should probably"
                " TRAIN this model on a down-stream task to be able to use it for predictions and inference."
            )
        elif len(mismatched_keys) == 0:
            logger.info(
                f"All the weights of {model.__class__.__name__} were initialized from the model checkpoint at"
                f" {pretrained_model_name_or_path}.\nIf your task is similar to the task the model of the checkpoint"
                f" was trained on, you can already use {model.__class__.__name__} for predictions without further"
                " training."
            )
        if len(mismatched_keys) > 0:
            mismatched_warning = "\n".join(
                [
                    f"- {key}: found shape {shape1} in the checkpoint and {shape2} in the model instantiated"
                    for key, shape1, shape2 in mismatched_keys
                ]
            )
            logger.warning(
                f"Some weights of {model.__class__.__name__} were not initialized from the model checkpoint at"
                f" {pretrained_model_name_or_path} and are newly initialized because the shapes did not"
                f" match:\n{mismatched_warning}\nYou should probably TRAIN this model on a down-stream task to be able"
                " to use it for predictions and inference."
            )

        return model, missing_keys, unexpected_keys, mismatched_keys, offload_index, error_msgs

    def retrieve_modules_from_names(self, names, add_prefix=False, remove_prefix=False):
        module_keys = {".".join(key.split(".")[:-1]) for key in names}

        # nn.ParameterList is a special case where two parameter keywords
        # are appended to the module name, *e.g.* bert.special_embeddings.0
        module_keys = module_keys.union(
            {".".join(key.split(".")[:-2]) for key in names if len(key) > 0 and key[-1].isdigit()}
        )

        retrieved_modules = []
        # retrieve all modules that has at least one missing weight name
        for name, module in self.named_modules():
            if remove_prefix:
                _prefix = f"{self.base_model_prefix}."
                name = name[len(_prefix) :] if name.startswith(_prefix) else name
            elif add_prefix:
                name = ".".join([self.base_model_prefix, name]) if len(name) > 0 else self.base_model_prefix

            if name in module_keys:
                retrieved_modules.append(module)

        return retrieved_modules

    @classmethod
    def register_for_auto_class(cls, auto_class="AutoModel"):
        """
        Register this class with a given auto class. This should only be used for custom models as the ones in the
        library are already mapped with an auto class.

        <Tip warning={true}>

        This API is experimental and may have some slight breaking changes in the next releases.

        </Tip>

        Args:
            auto_class (`str` or `type`, *optional*, defaults to `"AutoModel"`):
                The auto class to register this new model with.
        """
        if not isinstance(auto_class, str):
            auto_class = auto_class.__name__

        import mindnlp.transformers.models.auto as auto_module

        if not hasattr(auto_module, auto_class):
            raise ValueError(f"{auto_class} is not a valid auto class.")

        cls._auto_class = auto_class

    def warn_if_padding_and_no_attention_mask(self, input_ids, attention_mask):
        """
        Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.
        """

        if (attention_mask is not None) or (self.config.pad_token_id is None) or ON_ORANGE_PI:
            return

        # Check only the first and last input IDs to reduce overhead.
        if self.config.pad_token_id in input_ids.asnumpy()[:, [-1, 0]]:
            warn_string = (
                "We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See "
                "https://huggingface.co/docs/transformers/troubleshooting"
                "#incorrect-output-when-padding-tokens-arent-masked."
            )

            # If the pad token is equal to either BOS, EOS, or SEP, we do not know whether the user should use an
            # attention_mask or not. In this case, we should still show a warning because this is a rare case.
            if (
                (self.config.bos_token_id is not None and self.config.bos_token_id == self.config.pad_token_id)
                or (self.config.eos_token_id is not None and self.config.eos_token_id == self.config.pad_token_id)
                or (self.config.sep_token_id is not None and self.config.sep_token_id == self.config.pad_token_id)
            ):
                warn_string += (
                    f"\nYou may ignore this warning if your `pad_token_id` ({self.config.pad_token_id}) is identical "
                    f"to the `bos_token_id` ({self.config.bos_token_id}), `eos_token_id` ({self.config.eos_token_id}), "
                    f"or the `sep_token_id` ({self.config.sep_token_id}), and your input is not padded."
                )
            logger.warning_once(warn_string)

`mindnlp.transformers.modeling_utils.PreTrainedModel.base_model: nn.Module` `property` ¶

nn.Module: The main body of the model.

`mindnlp.transformers.modeling_utils.PreTrainedModel.dummy_inputs: Dict[str, mindspore.Tensor]` `property` ¶

Dict[str, mindspore.Tensor]: Dummy inputs to do a forward pass in the network.

`mindnlp.transformers.modeling_utils.PreTrainedModel.framework: str` `property` ¶

:str: Identifies that this is a PyTorch model.

`mindnlp.transformers.modeling_utils.PreTrainedModel.is_gradient_checkpointing: bool` `property` ¶

Whether gradient checkpointing is activated for this model or not.

Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint activations".

`mindnlp.transformers.modeling_utils.PreTrainedModel.add_model_tags(tags)` ¶

Add custom tags into the model that gets pushed to the Hugging Face Hub. Will not overwrite existing tags in the model.

PARAMETER	DESCRIPTION
`tags`	The desired tags to inject in the model TYPE: `Union[List[str], str]`

from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

model.add_model_tags(["custom", "custom-bert"])

# Push the model to your namespace with the name "my-custom-bert".
model.push_to_hub("my-custom-bert")

Source code in mindnlp\transformers\modeling_utils.py

def add_model_tags(self, tags: Union[List[str], str]) -> None:
    r"""
    Add custom tags into the model that gets pushed to the Hugging Face Hub. Will
    not overwrite existing tags in the model.

    Args:
        tags (`Union[List[str], str]`):
            The desired tags to inject in the model

    Examples:

    ```python
    from transformers import AutoModel

    model = AutoModel.from_pretrained("google-bert/bert-base-cased")

    model.add_model_tags(["custom", "custom-bert"])

    # Push the model to your namespace with the name "my-custom-bert".
    model.push_to_hub("my-custom-bert")
    ```
    """
    if isinstance(tags, str):
        tags = [tags]

    if self.model_tags is None:
        self.model_tags = []

    for tag in tags:
        if tag not in self.model_tags:
            self.model_tags.append(tag)

`mindnlp.transformers.modeling_utils.PreTrainedModel.can_generate()` `classmethod` ¶

Returns whether this model can generate sequences with .generate().

RETURNS	DESCRIPTION
`bool`	`bool`: Whether this model can generate sequences with `.generate()`.

Source code in mindnlp\transformers\modeling_utils.py

@classmethod
def can_generate(cls) -> bool:
    """
    Returns whether this model can generate sequences with `.generate()`.

    Returns:
        `bool`: Whether this model can generate sequences with `.generate()`.
    """
    # Detects whether `prepare_inputs_for_generation` has been overwritten, which is a requirement for generation.
    # Alternativelly, the model can also have a custom `generate` function.
    if "GenerationMixin" in str(cls.prepare_inputs_for_generation) and "GenerationMixin" in str(cls.generate):
        return False
    return True

`mindnlp.transformers.modeling_utils.PreTrainedModel.dequantize()` ¶

Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.

Source code in mindnlp\transformers\modeling_utils.py

def dequantize(self):
    """
    Potentially dequantize the model in case it has been quantized by a quantization method that support
    dequantization.
    """
    hf_quantizer = getattr(self, "hf_quantizer", None)

    if hf_quantizer is None:
        raise ValueError("You need to first quantize your model in order to dequantize it")

    return hf_quantizer.dequantize(self)

`mindnlp.transformers.modeling_utils.PreTrainedModel.disable_input_require_grads()` ¶

Removes the _require_grads_hook.

Source code in mindnlp\transformers\modeling_utils.py

def disable_input_require_grads(self):
    """
    Removes the `_require_grads_hook`.
    """
    self._require_grads_hook.remove()

`mindnlp.transformers.modeling_utils.PreTrainedModel.enable_input_require_grads()` ¶

Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping the model weights fixed.

Source code in mindnlp\transformers\modeling_utils.py

def enable_input_require_grads(self):
    """
    Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping
    the model weights fixed.
    """

    def make_inputs_require_grads(module, input, output):
        output.requires_grad = True

    self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads)

`mindnlp.transformers.modeling_utils.PreTrainedModel.from_pretrained(pretrained_model_name_or_path, *model_args, config=None, cache_dir=None, ignore_mismatched_sizes=False, force_download=False, local_files_only=False, token=None, revision='main', use_safetensors=None, **kwargs)` `classmethod` ¶

Instantiate a pretrained pytorch model from a pre-trained model configuration.

The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train().

The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning task.

The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those weights are discarded.

If model weights are the same precision as the base model (and is a supported model), weights will be lazily loaded in using the meta device and brought into memory once an input is passed through that layer regardless of low_cpu_mem_usage.

PARAMETER	DESCRIPTION
`pretrained_model_name_or_path`	Can be either: - A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. - A path to a directory containing model weights saved using [`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`. - A path or url to a tensorflow index checkpoint file (e.g, `./tf_model/model.ckpt.index`). In this case, `from_tf` should be set to `True` and a configuration object should be provided as `config` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards. - A path or url to a model folder containing a flax checkpoint file in .msgpack format (e.g, `./flax_model/` containing `flax_model.msgpack`). In this case, `from_flax` should be set to `True`. - `None` if you are both providing the configuration and state dictionary (resp. with keyword arguments `config` and `state_dict`). TYPE: `str` or `os.PathLike`, optional
`model_args`	All remaining positional arguments will be passed to the underlying model's `__init__` method. TYPE: `sequence of positional arguments, optional` DEFAULT: `()`
`config`	Can be either: - an instance of a class derived from [`PretrainedConfig`], - a string or path valid as input to [`~PretrainedConfig.from_pretrained`]. Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when: - The model is a model provided by the library (loaded with the model id string of a pretrained model). - The model was saved using [`~PreTrainedModel.save_pretrained`] and is reloaded by supplying the save directory. - The model is loaded by supplying a local directory as `pretrained_model_name_or_path` and a configuration JSON file named config.json is found in the directory. TYPE: `Union[PretrainedConfig, str, os.PathLike]`, optional DEFAULT: `None`
`state_dict`	A state dictionary to use instead of a state dictionary loaded from saved weights file. This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using [`~PreTrainedModel.save_pretrained`] and [`~PreTrainedModel.from_pretrained`] is not a simpler option. TYPE: `Dict[str, mindspore.Tensor]`, optional
`cache_dir`	Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. TYPE: `Union[str, os.PathLike]`, optional DEFAULT: `None`
`from_tf`	Load the model weights from a TensorFlow checkpoint save file (see docstring of `pretrained_model_name_or_path` argument). TYPE: `bool`, optional, defaults to `False`
`from_flax`	Load the model weights from a Flax checkpoint save file (see docstring of `pretrained_model_name_or_path` argument). TYPE: `bool`, optional, defaults to `False`
`ignore_mismatched_sizes`	Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels). TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`
`force_download`	Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`
`resume_download`	Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
`proxies`	A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. TYPE: `Dict[str, str]`, optional
output_loading_info(`bool`,	Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. TYPE: optional, defaults to `False`
local_files_only(`bool`,	Whether or not to only look at local files (i.e., do not try to download the model). TYPE: optional, defaults to `False`
`token`	The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `huggingface-cli login` (stored in `~/.huggingface`). TYPE: `str` or `bool`, optional DEFAULT: `None`
`revision`	The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. To test a pull request you made on the Hub, you can pass `revision="refs/pr/". TYPE: `str`, optional, defaults to `"main"` DEFAULT: `'main'`
`mirror`	Mirror source to accelerate downloads in China. If you are from China and have an accessibility problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. Please refer to the mirror site for more information. TYPE: `str`, optional
_fast_init(`bool`,	Whether or not to disable fast initialization. One should only disable _fast_init to ensure backwards compatibility with `transformers.__version__ < 4.6.0` for seeded model initialization. This argument will be removed at the next major version. See pull request 11471 for more information. TYPE: optional, defaults to `True`
`attn_implementation`	The attention implementation to use in the model (if relevant). The default is otherwise the manual `"eager"` implementation. TYPE: `str`, optional
low_cpu_mem_usage(`bool`,	Tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. Generally should be combined with a `device_map` (such as `"auto"`) for best results. This is an experimental feature and a subject to change at any moment. If the model weights are in the same precision as the model loaded in, `low_cpu_mem_usage` (without `device_map`) is redundant and will not provide any benefit in regards to CPU memory usage. However, this should still be enabled if you are passing in a `device_map`. TYPE: `optional`
`ms_dtype`	Override the default `mindspore.dtype.TensorType` and load the model under a specific `dtype`. The different options are: `torch.float16` or `torch.bfloat16` or `torch.float`: load in a specified `dtype`, ignoring the model's `config.ms_dtype` if one exists. If not specified - the model will get loaded in `torch.float` (fp32). `"auto"` - A `ms_dtype` entry in the `config.json` file of the model will be attempted to be used. If this entry isn't found then next check the `dtype` of the first weight in the checkpoint that's of a floating point type and use that as `dtype`. This will load the model using the `dtype` it was saved in at the end of the training. It can't be used as an indicator of how the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32. A string that is a valid `mindspore.dtype.TensorType`. E.g. "float32" loads the model in `mindspore.float32`, "float16" loads in `torch.float16` etc. For some models the `dtype` they were trained in is unknown - you may try to check the model's paper or reach out to the authors and ask them to add this information to the model's card and to insert the `ms_dtype` entry in `config.json` on the hub. TYPE: `str` or `mindspore.dtype.TensorType`, optional
`device_map`	A map that specifies where each submodule should go. It doesn't need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. If we only pass the device (e.g., `"cpu"`, `"cuda:1"`, `"mps"`, or a GPU ordinal rank like `1`) on which the model will be allocated, the device map will map the entire model to this device. Passing `device_map = 0` means put the whole model on GPU 0. To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For more information about each option see designing a device map. TYPE: `str` or `Dict[str, Union[int, str, torch.device]]` or `int` or `torch.device`, optional
`max_memory`	A dictionary device identifier to maximum memory. Will default to the maximum memory available for each GPU and the available CPU RAM if unset. TYPE: `Dict`, optional
`offload_folder`	If the `device_map` contains any value `"disk"`, the folder where we will offload weights. TYPE: `str` or `os.PathLike`, optional
`offload_state_dict`	If `True`, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to `True` when there is some disk offload. TYPE: `bool`, optional
`offload_buffers`	Whether or not to offload the buffers with the model parameters. TYPE: `bool`, optional
`quantization_config`	A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g bitsandbytes, gptq). There may be other quantization-related kwargs, including `load_in_4bit` and `load_in_8bit`, which are parsed by QuantizationConfigParser. Supported only for bitsandbytes quantizations and not preferred. consider inserting all such arguments into quantization_config instead. TYPE: `Union[QuantizationConfigMixin,Dict]`, optional
`subfolder`	In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. TYPE: `str`, optional, defaults to `""`
`variant`	If specified load weights from `variant` filename, e.g. pytorch_model..bin. `variant` is ignored when using `from_tf` or `from_flax`. TYPE: `str`, optional
`use_safetensors`	Whether or not to use `safetensors` checkpoints. Defaults to `None`. If not specified and `safetensors` is not installed, it will be set to `False`. TYPE: `bool`, optional, defaults to `None` DEFAULT: `None`
`kwargs`	Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., `output_attentions=True`). Behaves differently depending on whether a `config` is provided or automatically loaded: - If a configuration is provided with `config`, `kwargs` will be directly passed to the underlying model's `__init__` method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided, `kwargs` will be first passed to the configuration class initialization function ([`~PretrainedConfig.from_pretrained`]). Each key of `kwargs` that corresponds to a configuration attribute will be used to override said attribute with the supplied `kwargs` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's `__init__` function. TYPE:** `remaining dictionary of keyword arguments, optional` DEFAULT: `{}`

Activate the special "offline-mode" to use this method in a firewalled environment.

Examples:

>>> from transformers import BertConfig, BertModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = BertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./tf_model/my_tf_model_config.json")
>>> model = BertModel.from_pretrained("./tf_model/my_tf_checkpoint.ckpt.index", from_tf=True, config=config)
>>> # Loading from a Flax checkpoint file instead of a PyTorch model (slower)
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)

low_cpu_mem_usage algorithm:

This is an experimental function that loads the model using ~1x model size CPU memory

Here is how it works:

save which state_dict keys we have
drop state_dict before the model is created, since the latter takes 1x model size CPU memory
after the model has been instantiated switch to the meta device all params/buffers that are going to be replaced from the loaded state_dict
load state_dict 2^nd time
replace the params/buffers from the state_dict

Currently, it can't handle deepspeed ZeRO stage 3 and ignores loading errors

Source code in mindnlp\transformers\modeling_utils.py

@classmethod
def from_pretrained(
    cls,
    pretrained_model_name_or_path: Optional[Union[str, os.PathLike]],
    *model_args,
    config: Optional[Union[PretrainedConfig, str, os.PathLike]] = None,
    cache_dir: Optional[Union[str, os.PathLike]] = None,
    ignore_mismatched_sizes: bool = False,
    force_download: bool = False,
    local_files_only: bool = False,
    token: Optional[Union[str, bool]] = None,
    revision: str = "main",
    use_safetensors: bool = None,
    **kwargs,
) -> "PreTrainedModel":
    r"""
    Instantiate a pretrained pytorch model from a pre-trained model configuration.

    The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). To train
    the model, you should first set it back in training mode with `model.train()`.

    The warning *Weights from XXX not initialized from pretrained model* means that the weights of XXX do not come
    pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning
    task.

    The warning *Weights from XXX not used in YYY* means that the layer XXX is not used by YYY, therefore those
    weights are discarded.

    If model weights are the same precision as the base model (and is a supported model), weights will be lazily loaded
    in using the `meta` device and brought into memory once an input is passed through that layer regardless of
    `low_cpu_mem_usage`.

    Parameters:
        pretrained_model_name_or_path (`str` or `os.PathLike`, *optional*):
            Can be either:

                - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
                - A path to a *directory* containing model weights saved using
                  [`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
                - A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
                  this case, `from_tf` should be set to `True` and a configuration object should be provided as
                  `config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
                  PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
                - A path or url to a model folder containing a *flax checkpoint file* in *.msgpack* format (e.g,
                  `./flax_model/` containing `flax_model.msgpack`). In this case, `from_flax` should be set to
                  `True`.
                - `None` if you are both providing the configuration and state dictionary (resp. with keyword
                  arguments `config` and `state_dict`).
        model_args (sequence of positional arguments, *optional*):
            All remaining positional arguments will be passed to the underlying model's `__init__` method.
        config (`Union[PretrainedConfig, str, os.PathLike]`, *optional*):
            Can be either:

                - an instance of a class derived from [`PretrainedConfig`],
                - a string or path valid as input to [`~PretrainedConfig.from_pretrained`].

            Configuration for the model to use instead of an automatically loaded configuration. Configuration can
            be automatically loaded when:

                - The model is a model provided by the library (loaded with the *model id* string of a pretrained
                  model).
                - The model was saved using [`~PreTrainedModel.save_pretrained`] and is reloaded by supplying the
                  save directory.
                - The model is loaded by supplying a local directory as `pretrained_model_name_or_path` and a
                  configuration JSON file named *config.json* is found in the directory.
        state_dict (`Dict[str, mindspore.Tensor]`, *optional*):
            A state dictionary to use instead of a state dictionary loaded from saved weights file.

            This option can be used if you want to create a model from a pretrained configuration but load your own
            weights. In this case though, you should check if using [`~PreTrainedModel.save_pretrained`] and
            [`~PreTrainedModel.from_pretrained`] is not a simpler option.
        cache_dir (`Union[str, os.PathLike]`, *optional*):
            Path to a directory in which a downloaded pretrained model configuration should be cached if the
            standard cache should not be used.
        from_tf (`bool`, *optional*, defaults to `False`):
            Load the model weights from a TensorFlow checkpoint save file (see docstring of
            `pretrained_model_name_or_path` argument).
        from_flax (`bool`, *optional*, defaults to `False`):
            Load the model weights from a Flax checkpoint save file (see docstring of
            `pretrained_model_name_or_path` argument).
        ignore_mismatched_sizes (`bool`, *optional*, defaults to `False`):
            Whether or not to raise an error if some of the weights from the checkpoint do not have the same size
            as the weights of the model (if for instance, you are instantiating a model with 10 labels from a
            checkpoint with 3 labels).
        force_download (`bool`, *optional*, defaults to `False`):
            Whether or not to force the (re-)download of the model weights and configuration files, overriding the
            cached versions if they exist.
        resume_download:
            Deprecated and ignored. All downloads are now resumed by default when possible.
            Will be removed in v5 of Transformers.
        proxies (`Dict[str, str]`, *optional*):
            A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
            'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
        output_loading_info(`bool`, *optional*, defaults to `False`):
            Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
        local_files_only(`bool`, *optional*, defaults to `False`):
            Whether or not to only look at local files (i.e., do not try to download the model).
        token (`str` or `bool`, *optional*):
            The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
            the token generated when running `huggingface-cli login` (stored in `~/.huggingface`).
        revision (`str`, *optional*, defaults to `"main"`):
            The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
            git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
            identifier allowed by git.

            <Tip>

            To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>".

            </Tip>

        mirror (`str`, *optional*):
            Mirror source to accelerate downloads in China. If you are from China and have an accessibility
            problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety.
            Please refer to the mirror site for more information.
        _fast_init(`bool`, *optional*, defaults to `True`):
            Whether or not to disable fast initialization.

            <Tip warning={true}>

            One should only disable *_fast_init* to ensure backwards compatibility with `transformers.__version__ <
            4.6.0` for seeded model initialization. This argument will be removed at the next major version. See
            [pull request 11471](https://github.com/huggingface/transformers/pull/11471) for more information.

            </Tip>
        attn_implementation (`str`, *optional*):
            The attention implementation to use in the model (if relevant). The default is otherwise the manual `"eager"` implementation.

        > Parameters for big model inference

        low_cpu_mem_usage(`bool`, *optional*):
            Tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model.
            Generally should be combined with a `device_map` (such as `"auto"`) for best results.
            This is an experimental feature and a subject to change at any moment.
            </Tip>
                If the model weights are in the same precision as the model loaded in, `low_cpu_mem_usage` (without
                `device_map`) is redundant and will not provide any benefit in regards to CPU memory usage. However,
                this should still be enabled if you are passing in a `device_map`.
            </Tip>
        ms_dtype (`str` or `mindspore.dtype.TensorType`, *optional*):
            Override the default `mindspore.dtype.TensorType` and load the model under a specific `dtype`. The different options
            are:

            1. `torch.float16` or `torch.bfloat16` or `torch.float`: load in a specified
              `dtype`, ignoring the model's `config.ms_dtype` if one exists. If not specified
              - the model will get loaded in `torch.float` (fp32).

            2. `"auto"` - A `ms_dtype` entry in the `config.json` file of the model will be
              attempted to be used. If this entry isn't found then next check the `dtype` of the first weight in
              the checkpoint that's of a floating point type and use that as `dtype`. This will load the model
              using the `dtype` it was saved in at the end of the training. It can't be used as an indicator of how
              the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.

            3. A string that is a valid `mindspore.dtype.TensorType`. E.g. "float32" loads the model in `mindspore.float32`, "float16" loads in `torch.float16` etc.

            <Tip>

            For some models the `dtype` they were trained in is unknown - you may try to check the model's paper or
            reach out to the authors and ask them to add this information to the model's card and to insert the
            `ms_dtype` entry in `config.json` on the hub.

            </Tip>

        device_map (`str` or `Dict[str, Union[int, str, torch.device]]` or `int` or `torch.device`, *optional*):
            A map that specifies where each submodule should go. It doesn't need to be refined to each
            parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the
            same device. If we only pass the device (*e.g.*, `"cpu"`, `"cuda:1"`, `"mps"`, or a GPU ordinal rank
            like `1`) on which the model will be allocated, the device map will map the entire model to this
            device. Passing `device_map = 0` means put the whole model on GPU 0.

            To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For
            more information about each option see [designing a device
            map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map).
        max_memory (`Dict`, *optional*):
            A dictionary device identifier to maximum memory. Will default to the maximum memory available for each
            GPU and the available CPU RAM if unset.
        offload_folder (`str` or `os.PathLike`, *optional*):
            If the `device_map` contains any value `"disk"`, the folder where we will offload weights.
        offload_state_dict (`bool`, *optional*):
            If `True`, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU
            RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to
            `True` when there is some disk offload.
        offload_buffers (`bool`, *optional*):
            Whether or not to offload the buffers with the model parameters.
        quantization_config (`Union[QuantizationConfigMixin,Dict]`, *optional*):
            A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g
            bitsandbytes, gptq). There may be other quantization-related kwargs, including `load_in_4bit` and
            `load_in_8bit`, which are parsed by QuantizationConfigParser. Supported only for bitsandbytes
            quantizations and not preferred. consider inserting all such arguments into quantization_config
            instead.
        subfolder (`str`, *optional*, defaults to `""`):
            In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
            specify the folder name here.
        variant (`str`, *optional*):
            If specified load weights from `variant` filename, *e.g.* pytorch_model.<variant>.bin. `variant` is
            ignored when using `from_tf` or `from_flax`.
        use_safetensors (`bool`, *optional*, defaults to `None`):
            Whether or not to use `safetensors` checkpoints. Defaults to `None`. If not specified and `safetensors`
            is not installed, it will be set to `False`.

        kwargs (remaining dictionary of keyword arguments, *optional*):
            Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
            `output_attentions=True`). Behaves differently depending on whether a `config` is provided or
            automatically loaded:

                - If a configuration is provided with `config`, `**kwargs` will be directly passed to the
                  underlying model's `__init__` method (we assume all relevant updates to the configuration have
                  already been done)
                - If a configuration is not provided, `kwargs` will be first passed to the configuration class
                  initialization function ([`~PretrainedConfig.from_pretrained`]). Each key of `kwargs` that
                  corresponds to a configuration attribute will be used to override said attribute with the
                  supplied `kwargs` value. Remaining keys that do not correspond to any configuration attribute
                  will be passed to the underlying model's `__init__` function.

    <Tip>

    Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to
    use this method in a firewalled environment.

    </Tip>

    Examples:

    ```python
    >>> from transformers import BertConfig, BertModel

    >>> # Download model and configuration from huggingface.co and cache.
    >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
    >>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
    >>> model = BertModel.from_pretrained("./test/saved_model/")
    >>> # Update configuration during loading.
    >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
    >>> assert model.config.output_attentions == True
    >>> # Loading from a TF checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
    >>> config = BertConfig.from_json_file("./tf_model/my_tf_model_config.json")
    >>> model = BertModel.from_pretrained("./tf_model/my_tf_checkpoint.ckpt.index", from_tf=True, config=config)
    >>> # Loading from a Flax checkpoint file instead of a PyTorch model (slower)
    >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)
    ```

    * `low_cpu_mem_usage` algorithm:

    This is an experimental function that loads the model using ~1x model size CPU memory

    Here is how it works:

    1. save which state_dict keys we have
    2. drop state_dict before the model is created, since the latter takes 1x model size CPU memory
    3. after the model has been instantiated switch to the meta device all params/buffers that
    are going to be replaced from the loaded state_dict
    4. load state_dict 2nd time
    5. replace the params/buffers from the state_dict

    Currently, it can't handle deepspeed ZeRO stage 3 and ignores loading errors

    """
    state_dict = kwargs.pop("state_dict", None)
    from_pt = kwargs.pop("from_pt", True)
    resume_download = kwargs.pop("resume_download", None)
    proxies = kwargs.pop("proxies", None)
    output_loading_info = kwargs.pop("output_loading_info", False)
    from_pipeline = kwargs.pop("_from_pipeline", None)
    from_auto_class = kwargs.pop("_from_auto", False)
    _fast_init = kwargs.pop("_fast_init", True)
    ms_dtype = kwargs.pop("ms_dtype", None)
    low_cpu_mem_usage = kwargs.pop("low_cpu_mem_usage", None)
    device_map = kwargs.pop("device_map", None)
    max_memory = kwargs.pop("max_memory", None)
    offload_folder = kwargs.pop("offload_folder", None)
    offload_state_dict = kwargs.pop("offload_state_dict", False)
    offload_buffers = kwargs.pop("offload_buffers", False)
    load_in_8bit = kwargs.pop("load_in_8bit", False)
    load_in_4bit = kwargs.pop("load_in_4bit", False)
    quantization_config = kwargs.pop("quantization_config", None)
    mirror = kwargs.pop('mirror', 'huggingface')
    subfolder = kwargs.pop("subfolder", "")
    commit_hash = kwargs.pop("_commit_hash", None)
    variant = kwargs.pop("variant", None)
    adapter_kwargs = kwargs.pop("adapter_kwargs", {})
    adapter_name = kwargs.pop("adapter_name", "default")
    use_flash_attention_2 = kwargs.pop("use_flash_attention_2", False)

    gguf_file = kwargs.pop("gguf_file", None)

    if token is not None and adapter_kwargs is not None and "token" not in adapter_kwargs:
        adapter_kwargs["token"] = token

    if use_safetensors is None and not is_safetensors_available():
        use_safetensors = False

    if gguf_file is not None:
        raise ValueError("MindNLP not support GGUF file.")

    if commit_hash is None:
        if not isinstance(config, PretrainedConfig):
            # We make a call to the config file first (which may be absent) to get the commit hash as soon as possible
            resolved_config_file = cached_file(
                pretrained_model_name_or_path,
                CONFIG_NAME,
                cache_dir=cache_dir,
                force_download=force_download,
                resume_download=resume_download,
                proxies=proxies,
                local_files_only=local_files_only,
                token=token,
                revision=revision,
                subfolder=subfolder,
                _raise_exceptions_for_gated_repo=False,
                _raise_exceptions_for_missing_entries=False,
                _raise_exceptions_for_connection_errors=False,
            )
            commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
        else:
            commit_hash = getattr(config, "_commit_hash", None)

    _adapter_model_path = adapter_kwargs.pop("_adapter_model_path", None)

    if _adapter_model_path is None:
        _adapter_model_path = find_adapter_config_file(
            pretrained_model_name_or_path,
            cache_dir=cache_dir,
            force_download=force_download,
            resume_download=resume_download,
            proxies=proxies,
            local_files_only=local_files_only,
            _commit_hash=commit_hash,
            mirror=mirror,
            **adapter_kwargs,
        )
    if _adapter_model_path is not None and os.path.isfile(_adapter_model_path):
        with open(_adapter_model_path, "r", encoding="utf-8") as f:
            _adapter_model_path = pretrained_model_name_or_path
            pretrained_model_name_or_path = json.load(f)["base_model_name_or_path"]

    # change device_map into a map if we passed an int, a str or a torch.device
    if isinstance(device_map, str) and device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]:
        raise ValueError(
            "When passing device_map as a string, the value needs to be "
            f"'auto', 'balanced', 'balanced_low_0', 'sequential' but found {device_map}."
        )
    elif isinstance(device_map, int):
        if device_map < 0:
            raise ValueError(
                "You can't pass device_map as a negative int. If you want to put the model on the cpu, pass device_map = 'cpu' "
            )
        else:
            device_map = {"": device_map}

    if device_map is not None:
        if low_cpu_mem_usage is None:
            low_cpu_mem_usage = True
        elif not low_cpu_mem_usage:
            raise ValueError("Passing along a `device_map` requires `low_cpu_mem_usage=True`")

    # handling bnb config from kwargs, remove after `load_in_{4/8}bit` deprecation.
    if load_in_4bit or load_in_8bit:
        logger.warning("MindNLP do not support 4bit or 8bit for now.")

    user_agent = {"file_type": "model", "framework": "pytorch", "from_auto_class": from_auto_class}
    if from_pipeline is not None:
        user_agent["using_pipeline"] = from_pipeline

    if is_offline_mode() and not local_files_only:
        logger.info("Offline mode: forcing local_files_only=True")
        local_files_only = True

    # Load config if we don't provide a configuration
    if not isinstance(config, PretrainedConfig):
        config_path = config if config is not None else pretrained_model_name_or_path
        config, model_kwargs = cls.config_class.from_pretrained(
            config_path,
            cache_dir=cache_dir,
            return_unused_kwargs=True,
            force_download=force_download,
            resume_download=resume_download,
            proxies=proxies,
            local_files_only=local_files_only,
            token=token,
            revision=revision,
            subfolder=subfolder,
            mirror=mirror,
            _from_auto=from_auto_class,
            _from_pipeline=from_pipeline,
            **kwargs,
        )
    else:
        # In case one passes a config to `from_pretrained` + "attn_implementation"
        # override the `_attn_implementation` attribute to `attn_implementation` of the kwargs
        # Please see: https://github.com/huggingface/transformers/issues/28038

        # Overwrite `config._attn_implementation` by the one from the kwargs --> in auto-factory
        # we pop attn_implementation from the kwargs but this handles the case where users
        # passes manually the config to `from_pretrained`.
        config = copy.deepcopy(config)

        kwarg_attn_imp = kwargs.pop("attn_implementation", None)
        if kwarg_attn_imp is not None:
            config._attn_implementation = kwarg_attn_imp

        model_kwargs = kwargs

    # This variable will flag if we're loading a sharded checkpoint. In this case the archive file is just the
    # index of the files.
    is_sharded = False
    sharded_metadata = None
    # Load model
    loading_info = None

    # Keep in fp32 modules
    keep_in_fp32_modules = None
    use_keep_in_fp32_modules = False

    if pretrained_model_name_or_path is not None and gguf_file is None:
        pretrained_model_name_or_path = str(pretrained_model_name_or_path)
        is_local = os.path.isdir(pretrained_model_name_or_path)
        if is_local:
            if use_safetensors is not False and os.path.isfile(
                os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_NAME, variant))
            ):
                # Load from a safetensors checkpoint
                archive_file = os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_NAME, variant)
                )
            elif use_safetensors is not False and os.path.isfile(
                os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_INDEX_NAME, variant)
                )
            ):
                # Load from a sharded safetensors checkpoint
                archive_file = os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(SAFE_WEIGHTS_INDEX_NAME, variant)
                )
                is_sharded = True
            elif not use_safetensors and os.path.isfile(
                os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_NAME, variant))
            ):
                # Load from a MindSpore checkpoint
                archive_file = os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_NAME, variant)
                )
            elif not use_safetensors and os.path.isfile(
                os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_INDEX_NAME, variant))
            ):
                # Load from a sharded MindSpore checkpoint
                archive_file = os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(WEIGHTS_INDEX_NAME, variant)
                )
                is_sharded = True
            elif not use_safetensors and os.path.isfile(
                os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_NAME, variant))
            ):
                # Load from a PyTorch checkpoint
                archive_file = os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_NAME, variant)
                )
            elif not use_safetensors and os.path.isfile(
                os.path.join(pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_INDEX_NAME, variant))
            ):
                # Load from a sharded PyTorch checkpoint
                archive_file = os.path.join(
                    pretrained_model_name_or_path, subfolder, _add_variant(PT_WEIGHTS_INDEX_NAME, variant)
                )
                is_sharded = True
            elif use_safetensors:
                raise EnvironmentError(
                    f"Error no file named {_add_variant(SAFE_WEIGHTS_NAME, variant)} found in directory"
                    f" {pretrained_model_name_or_path}."
                )
            else:
                raise EnvironmentError(
                    f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(PT_WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
                    f" found in directory {pretrained_model_name_or_path}."
                )
        elif os.path.isfile(os.path.join(subfolder, pretrained_model_name_or_path)):
            archive_file = pretrained_model_name_or_path
            is_local = True
        elif is_remote_url(pretrained_model_name_or_path):
            filename = pretrained_model_name_or_path
            resolved_archive_file = download_url(pretrained_model_name_or_path)
        else:
            # set correct filename
            if use_safetensors is not False:
                filename = _add_variant(SAFE_WEIGHTS_NAME, variant)
            elif from_pt:
                filename = _add_variant(PT_WEIGHTS_NAME, variant)
            else:
                filename = _add_variant(WEIGHTS_NAME, variant)

            try:
                # Load from URL or cache if already cached
                cached_file_kwargs = {
                    "cache_dir": cache_dir,
                    "force_download": force_download,
                    "proxies": proxies,
                    "resume_download": resume_download,
                    "local_files_only": local_files_only,
                    "token": token,
                    "user_agent": user_agent,
                    "revision": revision,
                    "subfolder": subfolder,
                    "_raise_exceptions_for_gated_repo": False,
                    "_raise_exceptions_for_missing_entries": False,
                    # "_commit_hash": commit_hash,
                    "mirror": mirror
                }
                resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)

                # Since we set _raise_exceptions_for_missing_entries=False, we don't get an exception but a None
                # result when internet is up, the repo and revision exist, but the file does not.
                if resolved_archive_file is None and filename == _add_variant(SAFE_WEIGHTS_NAME, variant):
                    # Maybe the checkpoint is sharded, we try to grab the index name in this case.
                    resolved_archive_file = cached_file(
                        pretrained_model_name_or_path,
                        _add_variant(SAFE_WEIGHTS_INDEX_NAME, variant),
                        **cached_file_kwargs,
                    )
                    if resolved_archive_file is not None:
                        is_sharded = True
                    elif use_safetensors:
                        if revision == "main":
                            resolved_archive_file, revision, is_sharded = auto_conversion(
                                pretrained_model_name_or_path, **cached_file_kwargs
                            )
                        cached_file_kwargs["revision"] = revision
                        if resolved_archive_file is None:
                            raise EnvironmentError(
                                f"{pretrained_model_name_or_path} does not appear to have a file named"
                                f" {_add_variant(SAFE_WEIGHTS_NAME, variant)} or {_add_variant(SAFE_WEIGHTS_INDEX_NAME, variant)} "
                                "and thus cannot be loaded with `safetensors`. Please make sure that the model has "
                                "been saved with `safe_serialization=True` or do not set `use_safetensors=True`."
                            )
                    elif from_pt:
                        filename = _add_variant(PT_WEIGHTS_NAME, variant)
                        resolved_archive_file = cached_file(
                            pretrained_model_name_or_path, filename, **cached_file_kwargs
                        )
                    else:
                        # This repo has no safetensors file of any kind, we switch to PyTorch.
                        filename = _add_variant(WEIGHTS_NAME, variant)
                        resolved_archive_file = cached_file(
                            pretrained_model_name_or_path, filename, **cached_file_kwargs
                        )

                if resolved_archive_file is None and filename == _add_variant(WEIGHTS_NAME, variant):
                    # Maybe the checkpoint is sharded, we try to grab the index name in this case.
                    resolved_archive_file = cached_file(
                        pretrained_model_name_or_path,
                        _add_variant(WEIGHTS_INDEX_NAME, variant),
                        **cached_file_kwargs,
                    )
                    if resolved_archive_file is not None:
                        is_sharded = True

                if resolved_archive_file is None and filename == _add_variant(PT_WEIGHTS_NAME, variant):
                    # Maybe the checkpoint is sharded, we try to grab the index name in this case.
                    resolved_archive_file = cached_file(
                        pretrained_model_name_or_path,
                        _add_variant(PT_WEIGHTS_INDEX_NAME, variant),
                        **cached_file_kwargs,
                    )
                    if resolved_archive_file is not None:
                        is_sharded = True

                if not local_files_only and not is_offline_mode():
                    if resolved_archive_file is not None:
                        if filename in [WEIGHTS_NAME, WEIGHTS_INDEX_NAME, PT_WEIGHTS_NAME, PT_WEIGHTS_INDEX_NAME]:
                            # If the PyTorch file was found, check if there is a safetensors file on the repository
                            # If there is no safetensors file on the repositories, start an auto conversion
                            safe_weights_name = SAFE_WEIGHTS_INDEX_NAME if is_sharded else SAFE_WEIGHTS_NAME
                            has_file_kwargs = {
                                "revision": revision,
                                "proxies": proxies,
                                "token": token,
                                "cache_dir": cache_dir,
                                "local_files_only": local_files_only,
                                "mirror": mirror,
                            }
                            cached_file_kwargs = {
                                "cache_dir": cache_dir,
                                "force_download": force_download,
                                "resume_download": resume_download,
                                "local_files_only": local_files_only,
                                "user_agent": user_agent,
                                "subfolder": subfolder,
                                "_raise_exceptions_for_gated_repo": False,
                                "_raise_exceptions_for_missing_entries": False,
                                # "_commit_hash": commit_hash,
                                **has_file_kwargs,
                            }

                            if not has_file(pretrained_model_name_or_path, safe_weights_name, **has_file_kwargs):
                                Thread(
                                    target=auto_conversion,
                                    args=(pretrained_model_name_or_path,),
                                    kwargs={"ignore_errors_during_conversion": True, **cached_file_kwargs},
                                    name="Thread-autoconversion",
                                ).start()
                    else:
                        # Otherwise, no PyTorch file was found, maybe there is a TF or Flax model file.
                        # We try those to give a helpful error message.
                        has_file_kwargs = {
                            "mirror": mirror,
                            "revision": revision,
                            "proxies": proxies,
                            "token": token,
                            "cache_dir": cache_dir,
                            "local_files_only": local_files_only,
                        }
                        if variant is not None and has_file(
                            pretrained_model_name_or_path, WEIGHTS_NAME, **has_file_kwargs
                        ):
                            raise EnvironmentError(
                                f"{pretrained_model_name_or_path} does not appear to have a file named"
                                f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file without the variant"
                                f" {variant}. Use `variant=None` to load this model from those weights."
                            )
                        else:
                            raise EnvironmentError(
                                f"{pretrained_model_name_or_path} does not appear to have a file named"
                                f" {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)}."
                            )

            except EnvironmentError:
                # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted
                # to the original exception.
                raise
            except Exception as e:
                # For any other exception, we throw a generic error.
                raise EnvironmentError(
                    f"Can't load the model for '{pretrained_model_name_or_path}'. If you were trying to load it"
                    " from 'https://huggingface.co/models', make sure you don't have a local directory with the"
                    f" same name. Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a"
                    f" directory containing a file named {_add_variant(WEIGHTS_NAME, variant)}."
                ) from e

        if is_local:
            logger.info(f"loading weights file {archive_file}")
            resolved_archive_file = archive_file
        else:
            logger.info(f"loading weights file {filename} from cache at {resolved_archive_file}")

    else:
        resolved_archive_file = None

    # We'll need to download and cache each checkpoint shard if the checkpoint is sharded.
    if is_sharded:
        # resolved_archive_file becomes a list of files that point to the different checkpoint shards in this case.
        resolved_archive_file, sharded_metadata = get_checkpoint_shard_files(
            pretrained_model_name_or_path,
            resolved_archive_file,
            cache_dir=cache_dir,
            force_download=force_download,
            proxies=proxies,
            resume_download=resume_download,
            local_files_only=local_files_only,
            token=token,
            user_agent=user_agent,
            revision=revision,
            subfolder=subfolder,
            # _commit_hash=commit_hash,
            mirror=mirror
        )

    if (
        is_safetensors_available()
        and isinstance(resolved_archive_file, str)
        and resolved_archive_file.endswith(".safetensors")
    ):
        with safe_open(resolved_archive_file, framework="np") as f:
            metadata = f.metadata()

        if metadata.get("format") in ['pt', 'tf', 'flax', 'mlx', 'np']:
            pass
        else:
            raise ValueError(
                f"Incompatible safetensors file. File metadata is not ['np', 'pt', 'tf', 'flax', 'mlx'] but {metadata.get('format')}"
            )

    # load pt weights early so that we know which dtype to init the model under
    if from_pt:
        if not is_sharded and state_dict is None:
            # Time to load the checkpoint
            state_dict = load_state_dict(resolved_archive_file)

        # set dtype to instantiate the model under:
        # 1. If ms_dtype is not None, we use that dtype
        # 2. If ms_dtype is "auto", we auto-detect dtype from the loaded state_dict, by checking its first
        #    weights entry that is of a floating type - we assume all floating dtype weights are of the same dtype
        # we also may have config.ms_dtype available, but we won't rely on it till v5
        dtype_orig = None
        if ms_dtype is not None:
            if isinstance(ms_dtype, str):
                if ms_dtype == "auto":
                    if hasattr(config, "ms_dtype") and config.ms_dtype is not None:
                        ms_dtype = config.ms_dtype
                        logger.info(f"Will use ms_dtype={ms_dtype} as defined in model's config object")
                    else:
                        if is_sharded and "dtype" in sharded_metadata:
                            ms_dtype = sharded_metadata["dtype"]
                        elif not is_sharded:
                            ms_dtype = get_state_dict_dtype(state_dict)
                        else:
                            one_state_dict = load_state_dict(resolved_archive_file[0])
                            ms_dtype = get_state_dict_dtype(one_state_dict)
                            del one_state_dict  # free CPU memory
                        logger.info(
                            "Since the `ms_dtype` attribute can't be found in model's config object, "
                            "will use ms_dtype={ms_dtype} as derived from model's weights"
                        )
                elif hasattr(mindspore, ms_dtype):
                    ms_dtype = getattr(mindspore, ms_dtype)
                else:
                    raise ValueError(
                        f'`ms_dtype` can be one of: `mindspore.dtype.TensorType`, `"auto"` or a string of a valid `mindspore.dtype.TensorType`, but received {ms_dtype}'
                    )
            dtype_orig = cls._set_default_ms_dtype(ms_dtype)

        # Check if `_keep_in_fp32_modules` is not None
        use_keep_in_fp32_modules = (cls._keep_in_fp32_modules is not None) and (ms_dtype == mindspore.float16)

        if is_sharded:
            loaded_state_dict_keys = sharded_metadata["all_checkpoint_keys"]
        else:
            loaded_state_dict_keys = list(state_dict.keys())

    config.name_or_path = pretrained_model_name_or_path

    # Instantiate model.
    init_contexts = [no_init_weights(_enable=_fast_init)]

    config = copy.deepcopy(config)  # We do not want to modify the config inplace in from_pretrained.
    config = cls._autoset_attn_implementation(
        config, use_flash_attention_2=use_flash_attention_2, ms_dtype=ms_dtype, device_map=device_map
    )

    model_kwargs.pop('mirror', None)
    with ContextManagers(init_contexts):
        # Let's make sure we don't run the init function of buffer modules
        model = cls(config, *model_args, **model_kwargs)
    # make sure we use the model's config since the __init__ call might have copied it
    config = model.config

    # Check first if we are `from_pt`
    if use_keep_in_fp32_modules:
        keep_in_fp32_modules = model._keep_in_fp32_modules
    else:
        keep_in_fp32_modules = []

    try:
        group_size = get_group_size()
    except:
        group_size = 1

    if isinstance(device_map, str):
        special_dtypes = {}

        special_dtypes.update(
            {
                name: mindspore.float32
                for name, _ in model.named_parameters()
                if any(m in name for m in keep_in_fp32_modules)
            }
        )

        target_dtype = ms_dtype

        no_split_modules = model._get_no_split_modules(device_map)
        if device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]:
            raise ValueError(
                "If passing a string for `device_map`, please choose 'auto', 'balanced', 'balanced_low_0' or "
                "'sequential'."
            )

        device_map_kwargs = {"no_split_module_classes": no_split_modules}
        if "special_dtypes" in inspect.signature(infer_auto_device_map).parameters:
            device_map_kwargs["special_dtypes"] = special_dtypes
        elif len(special_dtypes) > 0:
            logger.warning(
                "This model has some weights that should be kept in higher precision, you need to upgrade "
                "`accelerate` to properly deal with them (`pip install --upgrade accelerate`)."
            )
        if device_map != "sequential":
            max_memory = get_balanced_memory(
                model,
                dtype=target_dtype,
                low_zero=(device_map == "balanced_low_0"),
                max_memory=max_memory,
                **device_map_kwargs,
            )
        else:
            max_memory = get_max_memory(max_memory)

        device_map_kwargs["max_memory"] = max_memory

        # Make sure tied weights are tied before creating the device map.
        model.tie_weights()
        device_map = infer_auto_device_map(model, dtype=target_dtype, **device_map_kwargs)

    elif device_map is not None:
        model.tie_weights()
        tied_params = find_tied_parameters(model)
        # check if we don't have tied param in different devices
        check_tied_parameters_on_same_device(tied_params, device_map)

    # replace unnessasery modules to nn.Identity and insert send/recv for pipeline inference.
    if device_map is not None and group_size != 1:
        model = modify_model_for_pp_infer(model, device_map, model._no_split_modules)

    if from_pt:
        # restore default dtype
        if dtype_orig is not None:
            set_default_dtype(dtype_orig)

        (
            model,
            missing_keys,
            unexpected_keys,
            mismatched_keys,
            offload_index,
            error_msgs,
        ) = cls._load_pretrained_model(
            model,
            state_dict,
            loaded_state_dict_keys,
            resolved_archive_file,
            pretrained_model_name_or_path,
            ignore_mismatched_sizes=ignore_mismatched_sizes,
            sharded_metadata=sharded_metadata,
            _fast_init=_fast_init,
            low_cpu_mem_usage=low_cpu_mem_usage,
            device_map=device_map,
            offload_folder=offload_folder,
            offload_state_dict=offload_state_dict,
            dtype=ms_dtype,
            keep_in_fp32_modules=keep_in_fp32_modules,
        )

    # make sure token embedding weights are still tied if needed
    model.tie_weights()

    # Set model in evaluation mode to deactivate DropOut modules by default
    model.eval()

    # If it is a model with generation capabilities, attempt to load the generation config
    if model.can_generate() and pretrained_model_name_or_path is not None:
        try:
            model.generation_config = GenerationConfig.from_pretrained(
                pretrained_model_name_or_path,
                cache_dir=cache_dir,
                force_download=force_download,
                resume_download=resume_download,
                proxies=proxies,
                local_files_only=local_files_only,
                token=token,
                revision=revision,
                subfolder=subfolder,
                mirror=mirror,
                _from_auto=from_auto_class,
                _from_pipeline=from_pipeline,
                **kwargs,
            )
        except OSError:
            logger.info(
                "Generation config file not found, using a generation config created from the model config."
            )

    if _adapter_model_path is not None:
        model.load_adapter(
            _adapter_model_path,
            adapter_name=adapter_name,
            token=token,
            adapter_kwargs=adapter_kwargs,
        )

    if output_loading_info:
        if loading_info is None:
            loading_info = {
                "missing_keys": missing_keys,
                "unexpected_keys": unexpected_keys,
                "mismatched_keys": mismatched_keys,
                "error_msgs": error_msgs,
            }
        return model, loading_info

    return model

`mindnlp.transformers.modeling_utils.PreTrainedModel.get_input_embeddings()` ¶

Returns the model's input embeddings.

RETURNS	DESCRIPTION
`Module`	`nn.Module`: A torch module mapping vocabulary to hidden states.

Source code in mindnlp\transformers\modeling_utils.py

def get_input_embeddings(self) -> nn.Module:
    """
    Returns the model's input embeddings.

    Returns:
        `nn.Module`: A torch module mapping vocabulary to hidden states.
    """
    base_model = getattr(self, self.base_model_prefix, self)
    if base_model is not self:
        return base_model.get_input_embeddings()
    else:
        raise NotImplementedError

`mindnlp.transformers.modeling_utils.PreTrainedModel.get_memory_footprint(return_buffers=True)` ¶

Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

PARAMETER	DESCRIPTION
`return_buffers`	Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2 TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`

Source code in mindnlp\transformers\modeling_utils.py

def get_memory_footprint(self, return_buffers=True):
    r"""
    Get the memory footprint of a model. This will return the memory footprint of the current model in bytes.
    Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the
    PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

    Arguments:
        return_buffers (`bool`, *optional*, defaults to `True`):
            Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers
            are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch
            norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2
    """
    mem = sum(param.nelement() * param.itemsize for param in self.parameters())
    if return_buffers:
        mem_bufs = sum(buf.nelement() * buf.itemsize for buf in self.buffers())
        mem = mem + mem_bufs
    return mem

`mindnlp.transformers.modeling_utils.PreTrainedModel.get_output_embeddings()` ¶

Returns the model's output embeddings.

RETURNS	DESCRIPTION
`Module`	`nn.Module`: A torch module mapping hidden states to vocabulary.

Source code in mindnlp\transformers\modeling_utils.py

def get_output_embeddings(self) -> nn.Module:
    """
    Returns the model's output embeddings.

    Returns:
        `nn.Module`: A torch module mapping hidden states to vocabulary.
    """
    return None  # Overwrite for models with output embeddings

`mindnlp.transformers.modeling_utils.PreTrainedModel.gradient_checkpointing_disable()` ¶

Deactivates gradient checkpointing for the current model.

Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint activations".

Source code in mindnlp\transformers\modeling_utils.py

def gradient_checkpointing_disable(self):
    """
    Deactivates gradient checkpointing for the current model.

    Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint
    activations".
    """
    if self.supports_gradient_checkpointing:
        # For old GC format (transformers < 4.35.0) for models that live on the Hub
        # we will fall back to the overwritten `_set_gradient_checkpointing` methid
        _is_using_old_format = "value" in inspect.signature(self._set_gradient_checkpointing).parameters
        if not _is_using_old_format:
            self._set_gradient_checkpointing(enable=False)
        else:
            logger.warning(
                "You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it)."
                "Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model."
            )
            self.apply(partial(self._set_gradient_checkpointing, value=False))

    if getattr(self, "_hf_peft_config_loaded", False):
        self.disable_input_require_grads()

`mindnlp.transformers.modeling_utils.PreTrainedModel.gradient_checkpointing_enable(gradient_checkpointing_kwargs=None)` ¶

Activates gradient checkpointing for the current model.

Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint activations".

We pass the __call__ method of the modules instead of forward because __call__ attaches all the hooks of the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2

PARAMETER	DESCRIPTION
`gradient_checkpointing_kwargs`	Additional keyword arguments passed along to the `torch.utils.checkpoint.checkpoint` function. TYPE: `dict, optional` DEFAULT: `None`

Source code in mindnlp\transformers\modeling_utils.py

def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
    """
    Activates gradient checkpointing for the current model.

    Note that in other frameworks this feature can be referred to as "activation checkpointing" or "checkpoint
    activations".

    We pass the `__call__` method of the modules instead of `forward` because `__call__` attaches all the hooks of
    the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2

    Args:
        gradient_checkpointing_kwargs (dict, *optional*):
            Additional keyword arguments passed along to the `torch.utils.checkpoint.checkpoint` function.
    """
    if not self.supports_gradient_checkpointing:
        raise ValueError(f"{self.__class__.__name__} does not support gradient checkpointing.")

    if gradient_checkpointing_kwargs is None:
        gradient_checkpointing_kwargs = {"use_reentrant": True}

    if GENERATOR_SEED:
        gradient_checkpointing_func = mindspore.recompute
    else:
        def function(fn, *args, **kwargs):
            return fn(*args, **kwargs)

        gradient_checkpointing_func = function

    # For old GC format (transformers < 4.35.0) for models that live on the Hub
    # we will fall back to the overwritten `_set_gradient_checkpointing` method
    _is_using_old_format = "value" in inspect.signature(self._set_gradient_checkpointing).parameters

    if not _is_using_old_format:
        self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func)
    else:
        self.apply(partial(self._set_gradient_checkpointing, value=True))
        logger.warning(
            "You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it)."
            "Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model."
        )

    if getattr(self, "_hf_peft_config_loaded", False):
        # When using PEFT + gradient checkpointing + Trainer we need to make sure the input has requires_grad=True
        # we do it also on PEFT: https://github.com/huggingface/peft/blob/85013987aa82aa1af3da1236b6902556ce3e483e/src/peft/peft_model.py#L334
        # When training with PEFT, only LoRA layers will have requires grad set to True, but the output of frozen layers need to propagate
        # the gradients to make sure the gradient flows.
        self.enable_input_require_grads()

`mindnlp.transformers.modeling_utils.PreTrainedModel.init_weights()` ¶

If needed prunes and maybe initializes weights. If using a custom PreTrainedModel, you need to implement any initialization logic in _init_weights.

Source code in mindnlp\transformers\modeling_utils.py

def init_weights(self):
    """
    If needed prunes and maybe initializes weights. If using a custom `PreTrainedModel`, you need to implement any
    initialization logic in `_init_weights`.
    """
    # Prune heads if needed
    if self.config.pruned_heads:
        self.prune_heads(self.config.pruned_heads)

    if _init_weights:
        # Initialize weights
        self.apply(self._initialize_weights)

        # Tie weights should be skipped when not initializing all weights
        # since from_pretrained(...) calls tie weights anyways
        self.tie_weights()

`mindnlp.transformers.modeling_utils.PreTrainedModel.post_init()` ¶

A method executed at the end of each Transformer model initialization, to execute code that needs the model's modules properly initialized (such as weight initialization).

Source code in mindnlp\transformers\modeling_utils.py

def post_init(self):
    """
    A method executed at the end of each Transformer model initialization, to execute code that needs the model's
    modules properly initialized (such as weight initialization).
    """
    self.init_weights()
    self._backward_compatibility_gradient_checkpointing()

`mindnlp.transformers.modeling_utils.PreTrainedModel.prune_heads(heads_to_prune)` ¶

Prunes heads of the base model.

PARAMETER	DESCRIPTION
`heads_to_prune`	Dictionary with keys being selected layer indices (`int`) and associated values being the list of heads to prune in said layer (list of `int`). For instance {1: [0, 2], 2: [2, 3]} will prune heads 0 and 2 on layer 1 and heads 2 and 3 on layer 2. TYPE: `Dict[int, List[int]]`

Source code in mindnlp\transformers\modeling_utils.py

def prune_heads(self, heads_to_prune: Dict[int, List[int]]):
    """
    Prunes heads of the base model.

    Arguments:
        heads_to_prune (`Dict[int, List[int]]`):
            Dictionary with keys being selected layer indices (`int`) and associated values being the list of heads
            to prune in said layer (list of `int`). For instance {1: [0, 2], 2: [2, 3]} will prune heads 0 and 2 on
            layer 1 and heads 2 and 3 on layer 2.
    """
    # save new sets of pruned heads as union of previously stored pruned heads and newly pruned heads
    for layer, heads in heads_to_prune.items():
        union_heads = set(self.config.pruned_heads.get(layer, [])) | set(heads)
        self.config.pruned_heads[layer] = list(union_heads)  # Unfortunately we have to store it as list for JSON

    self.base_model._prune_heads(heads_to_prune)

`mindnlp.transformers.modeling_utils.PreTrainedModel.register_for_auto_class(auto_class='AutoModel')` `classmethod` ¶

Register this class with a given auto class. This should only be used for custom models as the ones in the library are already mapped with an auto class.

This API is experimental and may have some slight breaking changes in the next releases.

PARAMETER	DESCRIPTION
`auto_class`	The auto class to register this new model with. TYPE: `str` or `type`, optional, defaults to `"AutoModel"` DEFAULT: `'AutoModel'`

Source code in mindnlp\transformers\modeling_utils.py

@classmethod
def register_for_auto_class(cls, auto_class="AutoModel"):
    """
    Register this class with a given auto class. This should only be used for custom models as the ones in the
    library are already mapped with an auto class.

    <Tip warning={true}>

    This API is experimental and may have some slight breaking changes in the next releases.

    </Tip>

    Args:
        auto_class (`str` or `type`, *optional*, defaults to `"AutoModel"`):
            The auto class to register this new model with.
    """
    if not isinstance(auto_class, str):
        auto_class = auto_class.__name__

    import mindnlp.transformers.models.auto as auto_module

    if not hasattr(auto_module, auto_class):
        raise ValueError(f"{auto_class} is not a valid auto class.")

    cls._auto_class = auto_class

`mindnlp.transformers.modeling_utils.PreTrainedModel.resize_token_embeddings(new_num_tokens=None, pad_to_multiple_of=None)` ¶

Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size.

Takes care of tying weights embeddings afterwards if the model class has a tie_weights() method.

PARAMETER DESCRIPTION

new_num_tokens

The new number of tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided or None, just returns a pointer to the input tokens nn.Embedding module of the model without doing anything.

TYPE: `int`, *optional* DEFAULT: None

pad_to_multiple_of

If set will pad the embedding matrix to a multiple of the provided value.If new_num_tokens is set to None will just pad the embedding to a multiple of pad_to_multiple_of.

This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

TYPE: `int`, *optional* DEFAULT: None

Return

nn.Embedding: Pointer to the input tokens Embeddings Module of the model.

Source code in mindnlp\transformers\modeling_utils.py

def resize_token_embeddings(
    self, new_num_tokens: Optional[int] = None, pad_to_multiple_of: Optional[int] = None
) -> nn.Embedding:
    """
    Resizes input token embeddings matrix of the model if `new_num_tokens != config.vocab_size`.

    Takes care of tying weights embeddings afterwards if the model class has a `tie_weights()` method.

    Arguments:
        new_num_tokens (`int`, *optional*):
            The new number of tokens in the embedding matrix. Increasing the size will add newly initialized
            vectors at the end. Reducing the size will remove vectors from the end. If not provided or `None`, just
            returns a pointer to the input tokens `nn.Embedding` module of the model without doing anything.
        pad_to_multiple_of (`int`, *optional*):
            If set will pad the embedding matrix to a multiple of the provided value.If `new_num_tokens` is set to
            `None` will just pad the embedding to a multiple of `pad_to_multiple_of`.

            This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
            `>= 7.5` (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more
            details about this, or help on choosing the correct value for resizing, refer to this guide:
            https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

    Return:
        `nn.Embedding`: Pointer to the input tokens Embeddings Module of the model.
    """
    model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
    if new_num_tokens is None and pad_to_multiple_of is None:
        return model_embeds

    # Since we are basically resuing the same old embeddings with new weight values, gathering is required
    vocab_size = model_embeds.weight.shape[0]

    # Update base model and current model config
    if hasattr(self.config, "text_config"):
        self.config.text_config.vocab_size = vocab_size
    else:
        self.config.vocab_size = vocab_size
    self.vocab_size = vocab_size

    # Tie weights again if needed
    self.tie_weights()

    return model_embeds

`mindnlp.transformers.modeling_utils.PreTrainedModel.save_pretrained(save_directory, is_main_process=True, state_dict=None, save_function=save_checkpoint, push_to_hub=False, max_shard_size='5GB', safe_serialization=True, variant=None, token=None, save_peft_format=True, **kwargs)` ¶

Save a model and its configuration file to a directory, so that it can be re-loaded using the [~PreTrainedModel.from_pretrained] class method.

PARAMETER	DESCRIPTION
`save_directory`	Directory to which to save. Will be created if it doesn't exist. TYPE: `str` or `os.PathLike`
`is_main_process`	Whether the process calling this is the main process or not. Useful when in distributed training like TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on the main process to avoid race conditions. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`state_dict`	The state dictionary of the model to save. Will default to `self.state_dict()`, but can be used to only save parts of the model or if special precautions need to be taken when recovering the state dictionary of a model (like when using model parallelism). TYPE: nested dictionary of `mindspore.Tensor` DEFAULT: `None`
`save_function`	The function to use to save the state dictionary. Useful on distributed training like TPUs when one need to replace `torch.save` by another method. TYPE: `Callable` DEFAULT: `save_checkpoint`
`push_to_hub`	Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with `repo_id` (will default to the name of `save_directory` in your namespace). TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`
`max_shard_size`	The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`). We default it to 5GB in order for models to be able to run easily on free-tier google colab instances without CPU OOM issues. If a single weight of the model is bigger than `max_shard_size`, it will be in its own checkpoint shard which will be bigger than `max_shard_size`. TYPE: `int` or `str`, optional, defaults to `"5GB"` DEFAULT: `'5GB'`
`safe_serialization`	Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`). TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`variant`	If specified, weights are saved in the format pytorch_model..bin. TYPE: `str`, optional DEFAULT: `None`
`token`	The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `huggingface-cli login` (stored in `~/.huggingface`). TYPE: `str` or `bool`, optional DEFAULT: `None`
`save_peft_format`	For backward compatibility with PEFT library, in case adapter weights are attached to the model, all keys of the state dict of adapters needs to be pre-pended with `base_model.model`. Advanced users can disable this behaviours by setting `save_peft_format` to `False`. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`kwargs`	Additional key word arguments passed along to the [`~utils.PushToHubMixin.push_to_hub`] method. TYPE: `Dict[str, Any]`, optional DEFAULT: `{}`

Source code in mindnlp\transformers\modeling_utils.py

def save_pretrained(
    self,
    save_directory: Union[str, os.PathLike],
    is_main_process: bool = True,
    state_dict: Optional[dict] = None,
    save_function: Callable = save_checkpoint,
    push_to_hub: bool = False,
    max_shard_size: Union[int, str] = "5GB",
    safe_serialization: bool = True,
    variant: Optional[str] = None,
    token: Optional[Union[str, bool]] = None,
    save_peft_format: bool = True,
    **kwargs,
):
    """
    Save a model and its configuration file to a directory, so that it can be re-loaded using the
    [`~PreTrainedModel.from_pretrained`] class method.

    Arguments:
        save_directory (`str` or `os.PathLike`):
            Directory to which to save. Will be created if it doesn't exist.
        is_main_process (`bool`, *optional*, defaults to `True`):
            Whether the process calling this is the main process or not. Useful when in distributed training like
            TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on
            the main process to avoid race conditions.
        state_dict (nested dictionary of `mindspore.Tensor`):
            The state dictionary of the model to save. Will default to `self.state_dict()`, but can be used to only
            save parts of the model or if special precautions need to be taken when recovering the state dictionary
            of a model (like when using model parallelism).
        save_function (`Callable`):
            The function to use to save the state dictionary. Useful on distributed training like TPUs when one
            need to replace `torch.save` by another method.
        push_to_hub (`bool`, *optional*, defaults to `False`):
            Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the
            repository you want to push to with `repo_id` (will default to the name of `save_directory` in your
            namespace).
        max_shard_size (`int` or `str`, *optional*, defaults to `"5GB"`):
            The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size
            lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`).
            We default it to 5GB in order for models to be able to run easily on free-tier google colab instances
            without CPU OOM issues.

            <Tip warning={true}>

            If a single weight of the model is bigger than `max_shard_size`, it will be in its own checkpoint shard
            which will be bigger than `max_shard_size`.

            </Tip>

        safe_serialization (`bool`, *optional*, defaults to `True`):
            Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).
        variant (`str`, *optional*):
            If specified, weights are saved in the format pytorch_model.<variant>.bin.
        token (`str` or `bool`, *optional*):
            The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
            the token generated when running `huggingface-cli login` (stored in `~/.huggingface`).
        save_peft_format (`bool`, *optional*, defaults to `True`):
            For backward compatibility with PEFT library, in case adapter weights are attached to the model, all
            keys of the state dict of adapters needs to be pre-pended with `base_model.model`. Advanced users can
            disable this behaviours by setting `save_peft_format` to `False`.
        kwargs (`Dict[str, Any]`, *optional*):
            Additional key word arguments passed along to the [`~utils.PushToHubMixin.push_to_hub`] method.
    """
    use_auth_token = kwargs.pop("use_auth_token", None)
    ignore_metadata_errors = kwargs.pop("ignore_metadata_errors", False)

    if use_auth_token is not None:
        warnings.warn(
            "The `use_auth_token` argument is deprecated. Please use `token` instead.",
            FutureWarning,
        )
        if token is not None:
            raise ValueError(
                "`token` and `use_auth_token` are both specified. Please set only the argument `token`."
            )
        token = use_auth_token

    if token is not None:
        kwargs["token"] = token

    _hf_peft_config_loaded = getattr(self, "_hf_peft_config_loaded", False)

    if "save_config" in kwargs:
        warnings.warn(
            "`save_config` is deprecated. Use `is_main_process` instead."
        )
        is_main_process = kwargs.pop("save_config")
    if safe_serialization and not is_safetensors_available():
        raise ImportError("`safe_serialization` requires the `safetensors library: `pip install safetensors`.")

    if os.path.isfile(save_directory):
        logger.error(f"Provided path ({save_directory}) should be a directory, not a file")
        return

    os.makedirs(save_directory, exist_ok=True)

    if push_to_hub:
        commit_message = kwargs.pop("commit_message", None)
        repo_id = kwargs.pop("repo_id", save_directory.split(os.path.sep)[-1])
        repo_id = self._create_repo(repo_id, **kwargs)
        files_timestamps = self._get_files_timestamps(save_directory)

    # Only save the model itself if we are using distributed training
    model_to_save = unwrap_model(self)

    # save the string version of dtype to the config, e.g. convert mindspore.float32 => "float32"
    # we currently don't use this setting automatically, but may start to use with v5
    dtype = get_parameter_dtype(model_to_save)
    model_to_save.config.ms_dtype = str(dtype).lower()

    # Attach architecture to the config
    model_to_save.config.architectures = [model_to_save.__class__.__name__]

    # Save the config
    if is_main_process:
        if not _hf_peft_config_loaded:
            model_to_save.config.save_pretrained(save_directory)
        if self.can_generate():
            # generation config built from the model config + the model config holds generation kwargs -> generate
            # may revert to legacy behavior if the two don't match
            if (
                model_to_save.generation_config._from_model_config
                and model_to_save.config._has_non_default_generation_parameters()
            ):
                new_generation_config = GenerationConfig.from_model_config(model_to_save.config)
                if new_generation_config != model_to_save.generation_config:
                    logger.warning(
                        "Your generation config was originally created from the model config, but the model "
                        "config has changed since then. Unless you pass the `generation_config` argument to this "
                        "model's `generate` calls, they will revert to the legacy behavior where the base "
                        "`generate` parameterization is loaded from the model config instead. "
                        "To avoid this behavior and this warning, we recommend you to overwrite the generation "
                        "config model attribute before calling the model's `save_pretrained`, preferably also "
                        "removing any generation kwargs from the model config. This warning will be raised to an "
                        "exception in v4.41."
                    )
            model_to_save.generation_config.save_pretrained(save_directory)

        if _hf_peft_config_loaded:
            logger.info(
                "Detected adapters on the model, saving the model in the PEFT format, only adapter weights will be saved."
            )
            state_dict = model_to_save.get_adapter_state_dict()

            if save_peft_format:
                logger.info(
                    "To match the expected format of the PEFT library, all keys of the state dict of adapters will be pre-pended with `base_model.model`."
                )
                peft_state_dict = {}
                for key, value in state_dict.items():
                    peft_state_dict[f"base_model.model.{key}"] = value
                state_dict = peft_state_dict

            active_adapter = self.active_adapters()

            if len(active_adapter) > 1:
                raise ValueError(
                    "Multiple active adapters detected, saving multiple active adapters is not supported yet. You can save adapters separately one by one "
                    "by iteratively calling `model.set_adapter(adapter_name)` then `model.save_pretrained(...)`"
                )
            active_adapter = active_adapter[0]

            current_peft_config = self.peft_config[active_adapter]
            current_peft_config.save_pretrained(save_directory)

    # for offloaded modules
    module_map = {}

    # Save the model
    if state_dict is None:
        # if any model parameters are offloaded, make module map
        if (
            hasattr(self, "hf_device_map")
            and len(set(self.hf_device_map.values())) > 1
            and ("cpu" in self.hf_device_map.values() or "disk" in self.hf_device_map.values())
        ):
            warnings.warn(
                "Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (5GB default)"
            )
            for name, module in model_to_save.named_modules():
                if name == "":
                    continue
                module_state_dict = module.state_dict()

                for key in module_state_dict:
                    module_map[name + f".{key}"] = module
        state_dict = model_to_save.state_dict()

    # Handle the case where some state_dict keys shouldn't be saved
    if self._keys_to_ignore_on_save is not None:
        for ignore_key in self._keys_to_ignore_on_save:
            if ignore_key in state_dict.keys():
                del state_dict[ignore_key]

    # Shard the model if it is too big.
    if not _hf_peft_config_loaded:
        weights_name = SAFE_WEIGHTS_NAME if safe_serialization else WEIGHTS_NAME
        weights_name = _add_variant(weights_name, variant)
    else:
        weights_name = ADAPTER_SAFE_WEIGHTS_NAME if safe_serialization else ADAPTER_WEIGHTS_NAME

    filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
    state_dict_split = split_state_dict_into_shards(
        state_dict, filename_pattern=filename_pattern, max_shard_size=max_shard_size
    )
    # Save index if sharded
    index = None
    if state_dict_split.is_sharded:
        index = {
            "metadata": state_dict_split.metadata,
            "weight_map": state_dict_split.tensor_to_filename,
        }

    # Clean the folder from a previous save
    for filename in os.listdir(save_directory):
        full_filename = os.path.join(save_directory, filename)
        # If we have a shard file that is not going to be replaced, we delete it, but only from the main process
        # in distributed settings to avoid race conditions.
        weights_no_suffix = weights_name.replace(".bin", "").replace(".safetensors", "")

        # make sure that file to be deleted matches format of sharded file, e.g. pytorch_model-00001-of-00005
        filename_no_suffix = filename.replace(".bin", "").replace(".safetensors", "")
        reg = re.compile(r"(.*?)-\d{5}-of-\d{5}")

        if (
            filename.startswith(weights_no_suffix)
            and os.path.isfile(full_filename)
            and filename not in state_dict_split.filename_to_tensors.keys()
            and is_main_process
            and reg.fullmatch(filename_no_suffix) is not None
        ):
            os.remove(full_filename)
    # Save the model
    filename_to_tensors = state_dict_split.filename_to_tensors.items()
    if module_map:
        filename_to_tensors = logging.tqdm(filename_to_tensors, desc="Saving checkpoint shards")
    for shard_file, tensors in filename_to_tensors:
        shard = {tensor: state_dict[tensor] for tensor in tensors}
        # remake shard with onloaded parameters if necessary
        if module_map:
            # init state_dict for this shard
            shard_state_dict = {name: "" for name in shard}
            for module_name in shard:
                module = module_map[module_name]
                # update state dict with onloaded parameters
                # shard_state_dict = get_state_dict_from_offload(module, module_name, shard_state_dict)

            # assign shard to be the completed state dict
            shard = shard_state_dict
            del shard_state_dict
            gc.collect()

        if safe_serialization:
            # At some point we will need to deal better with save_function (used for TPU and other distributed
            # joyfulness), but for now this enough.
            safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "np"})
        else:
            save_function(shard, os.path.join(save_directory, shard_file))

    if index is None:
        path_to_weights = os.path.join(save_directory, weights_name)
        logger.info(f"Model weights saved in {path_to_weights}")
    else:
        save_index_file = SAFE_WEIGHTS_INDEX_NAME if safe_serialization else WEIGHTS_INDEX_NAME
        save_index_file = os.path.join(save_directory, _add_variant(save_index_file, variant))
        # Save the index as well
        with open(save_index_file, "w", encoding="utf-8") as f:
            content = json.dumps(index, indent=2, sort_keys=True) + "\n"
            f.write(content)
        logger.info(
            f"The model is bigger than the maximum size per checkpoint ({max_shard_size}) and is going to be "
            f"split in {len(state_dict_split.filename_to_tensors)} checkpoint shards. You can find where each parameters has been saved in the "
            f"index located at {save_index_file}."
        )

`mindnlp.transformers.modeling_utils.PreTrainedModel.set_input_embeddings(value)` ¶

Set model's input embeddings.

PARAMETER	DESCRIPTION
`value`	A module mapping vocabulary to hidden states. TYPE: `nn.Module`

Source code in mindnlp\transformers\modeling_utils.py

def set_input_embeddings(self, value: nn.Module):
    """
    Set model's input embeddings.

    Args:
        value (`nn.Module`): A module mapping vocabulary to hidden states.
    """
    base_model = getattr(self, self.base_model_prefix, self)
    if base_model is not self:
        base_model.set_input_embeddings(value)
    else:
        raise NotImplementedError

`mindnlp.transformers.modeling_utils.PreTrainedModel.tie_weights()` ¶

Tie the weights between the input embeddings and the output embeddings.

Source code in mindnlp\transformers\modeling_utils.py

def tie_weights(self):
    """
    Tie the weights between the input embeddings and the output embeddings.

    """
    if getattr(self.config, "tie_word_embeddings", True):
        output_embeddings = self.get_output_embeddings()
        if output_embeddings is not None:
            self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())

    if getattr(self.config, "is_encoder_decoder", False) and getattr(self.config, "tie_encoder_decoder", False):
        if hasattr(self, self.base_model_prefix):
            self = getattr(self, self.base_model_prefix) # pylint: disable=self-cls-assignment
        tied_weights = self._tie_encoder_decoder_weights(
            self.encoder, self.decoder, self.base_model_prefix, "encoder"
        )
        # Setting a dynamic variable instead of `_tied_weights_keys` because it's a class
        # attributed not an instance member, therefore modifying it will modify the entire class
        # Leading to issues on subsequent calls by different tests or subsequent calls.
        self._dynamic_tied_weights_keys = tied_weights

    for module in self.modules():
        if hasattr(module, "_tie_weights"):
            module._tie_weights()

`mindnlp.transformers.modeling_utils.PreTrainedModel.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)` ¶

Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.

Source code in mindnlp\transformers\modeling_utils.py

def warn_if_padding_and_no_attention_mask(self, input_ids, attention_mask):
    """
    Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.
    """

    if (attention_mask is not None) or (self.config.pad_token_id is None) or ON_ORANGE_PI:
        return

    # Check only the first and last input IDs to reduce overhead.
    if self.config.pad_token_id in input_ids.asnumpy()[:, [-1, 0]]:
        warn_string = (
            "We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See "
            "https://huggingface.co/docs/transformers/troubleshooting"
            "#incorrect-output-when-padding-tokens-arent-masked."
        )

        # If the pad token is equal to either BOS, EOS, or SEP, we do not know whether the user should use an
        # attention_mask or not. In this case, we should still show a warning because this is a rare case.
        if (
            (self.config.bos_token_id is not None and self.config.bos_token_id == self.config.pad_token_id)
            or (self.config.eos_token_id is not None and self.config.eos_token_id == self.config.pad_token_id)
            or (self.config.sep_token_id is not None and self.config.sep_token_id == self.config.pad_token_id)
        ):
            warn_string += (
                f"\nYou may ignore this warning if your `pad_token_id` ({self.config.pad_token_id}) is identical "
                f"to the `bos_token_id` ({self.config.bos_token_id}), `eos_token_id` ({self.config.eos_token_id}), "
                f"or the `sep_token_id` ({self.config.sep_token_id}), and your input is not padded."
            )
        logger.warning_once(warn_string)

modeling_utils

mindnlp.transformers.modeling_utils.PreTrainedModel ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.base_model: nn.Module property ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.dummy_inputs: Dict[str, mindspore.Tensor] property ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.framework: str property ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.is_gradient_checkpointing: bool property ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.add_model_tags(tags) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.can_generate() classmethod ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.dequantize() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.disable_input_require_grads() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.enable_input_require_grads() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.from_pretrained(pretrained_model_name_or_path, *model_args, config=None, cache_dir=None, ignore_mismatched_sizes=False, force_download=False, local_files_only=False, token=None, revision='main', use_safetensors=None, **kwargs) classmethod ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.get_input_embeddings() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.get_memory_footprint(return_buffers=True) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.get_output_embeddings() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.gradient_checkpointing_disable() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.gradient_checkpointing_enable(gradient_checkpointing_kwargs=None) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.init_weights() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.post_init() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.prune_heads(heads_to_prune) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.register_for_auto_class(auto_class='AutoModel') classmethod ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.resize_token_embeddings(new_num_tokens=None, pad_to_multiple_of=None) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.save_pretrained(save_directory, is_main_process=True, state_dict=None, save_function=save_checkpoint, push_to_hub=False, max_shard_size='5GB', safe_serialization=True, variant=None, token=None, save_peft_format=True, **kwargs) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.set_input_embeddings(value) ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.tie_weights() ¶

mindnlp.transformers.modeling_utils.PreTrainedModel.warn_if_padding_and_no_attention_mask(input_ids, attention_mask) ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.base_model: nn.Module` `property` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.dummy_inputs: Dict[str, mindspore.Tensor]` `property` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.framework: str` `property` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.is_gradient_checkpointing: bool` `property` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.add_model_tags(tags)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.can_generate()` `classmethod` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.dequantize()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.disable_input_require_grads()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.enable_input_require_grads()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.from_pretrained(pretrained_model_name_or_path, *model_args, config=None, cache_dir=None, ignore_mismatched_sizes=False, force_download=False, local_files_only=False, token=None, revision='main', use_safetensors=None, **kwargs)` `classmethod` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.get_input_embeddings()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.get_memory_footprint(return_buffers=True)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.get_output_embeddings()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.gradient_checkpointing_disable()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.gradient_checkpointing_enable(gradient_checkpointing_kwargs=None)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.init_weights()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.post_init()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.prune_heads(heads_to_prune)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.register_for_auto_class(auto_class='AutoModel')` `classmethod` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.resize_token_embeddings(new_num_tokens=None, pad_to_multiple_of=None)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.save_pretrained(save_directory, is_main_process=True, state_dict=None, save_function=save_checkpoint, push_to_hub=False, max_shard_size='5GB', safe_serialization=True, variant=None, token=None, save_peft_format=True, **kwargs)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.set_input_embeddings(value)` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.tie_weights()` ¶

`mindnlp.transformers.modeling_utils.PreTrainedModel.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)` ¶