swiftformer

`mindnlp.transformers.models.swiftformer.configuration_swiftformer` ¶

SwiftFormer model configuration

`mindnlp.transformers.models.swiftformer.configuration_swiftformer.SwiftFormerConfig` ¶

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [SwiftFormerModel]. It is used to instantiate an SwiftFormer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the SwiftFormer MBZUAI/swiftformer-xs architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER	DESCRIPTION
`image_size`	The size (resolution) of each image TYPE: `int`, optional, defaults to 224 DEFAULT: `224`
`num_channels`	The number of input channels TYPE: `int`, optional, defaults to 3 DEFAULT: `3`
`depths`	Depth of each stage TYPE: `List[int]`, optional, defaults to `[3, 3, 6, 4]` DEFAULT: `[3, 3, 6, 4]`
`embed_dims`	The embedding dimension at each stage TYPE: `List[int]`, optional, defaults to `[48, 56, 112, 220]` DEFAULT: `[48, 56, 112, 220]`
`mlp_ratio`	Ratio of size of the hidden dimensionality of an MLP to the dimensionality of its input. TYPE: `int`, optional, defaults to 4 DEFAULT: `4`
`downsamples`	Whether or not to downsample inputs between two stages. TYPE: `List[bool]`, optional, defaults to `[True, True, True, True]` DEFAULT: `[True, True, True, True]`
`hidden_act`	The non-linear activation function (string). `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported. TYPE: `str`, optional, defaults to `"gelu"` DEFAULT: `'gelu'`
`down_patch_size`	The size of patches in downsampling layers. TYPE: `int`, optional, defaults to 3 DEFAULT: `3`
`down_stride`	The stride of convolution kernels in downsampling layers. TYPE: `int`, optional, defaults to 2 DEFAULT: `2`
`down_pad`	Padding in downsampling layers. TYPE: `int`, optional, defaults to 1 DEFAULT: `1`
`drop_path_rate`	Rate at which to increase dropout probability in DropPath. TYPE: `float`, optional, defaults to 0.0 DEFAULT: `0.0`
`drop_mlp_rate`	Dropout rate for the MLP component of SwiftFormer. TYPE: `float`, optional, defaults to 0.0 DEFAULT: `0.0`
`drop_conv_encoder_rate`	Dropout rate for the ConvEncoder component of SwiftFormer. TYPE: `float`, optional, defaults to 0.0 DEFAULT: `0.0`
`use_layer_scale`	Whether to scale outputs from token mixers. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`layer_scale_init_value`	Factor by which outputs from token mixers are scaled. TYPE: `float`, optional, defaults to 1e-05 DEFAULT: `1e-05`
`batch_norm_eps`	The epsilon used by the batch normalization layers. TYPE: `float`, optional, defaults to 1e-05 DEFAULT: `1e-05`

Example

>>> from transformers import SwiftFormerConfig, SwiftFormerModel
...
>>> # Initializing a SwiftFormer swiftformer-base-patch16-224 style configuration
>>> configuration = SwiftFormerConfig()
...
>>> # Initializing a model (with random weights) from the swiftformer-base-patch16-224 style configuration
>>> model = SwiftFormerModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config

Source code in mindnlp\transformers\models\swiftformer\configuration_swiftformer.py

class SwiftFormerConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`SwiftFormerModel`]. It is used to instantiate an
    SwiftFormer model according to the specified arguments, defining the model architecture. Instantiating a
    configuration with the defaults will yield a similar configuration to that of the SwiftFormer
    [MBZUAI/swiftformer-xs](https://huggingface.co/MBZUAI/swiftformer-xs) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        image_size (`int`, *optional*, defaults to 224):
            The size (resolution) of each image
        num_channels (`int`, *optional*, defaults to 3):
            The number of input channels
        depths (`List[int]`, *optional*, defaults to `[3, 3, 6, 4]`):
            Depth of each stage
        embed_dims (`List[int]`, *optional*, defaults to `[48, 56, 112, 220]`):
            The embedding dimension at each stage
        mlp_ratio (`int`, *optional*, defaults to 4):
            Ratio of size of the hidden dimensionality of an MLP to the dimensionality of its input.
        downsamples (`List[bool]`, *optional*, defaults to `[True, True, True, True]`):
            Whether or not to downsample inputs between two stages.
        hidden_act (`str`, *optional*, defaults to `"gelu"`):
            The non-linear activation function (string). `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
        down_patch_size (`int`, *optional*, defaults to 3):
            The size of patches in downsampling layers.
        down_stride (`int`, *optional*, defaults to 2):
            The stride of convolution kernels in downsampling layers.
        down_pad (`int`, *optional*, defaults to 1):
            Padding in downsampling layers.
        drop_path_rate (`float`, *optional*, defaults to 0.0):
            Rate at which to increase dropout probability in DropPath.
        drop_mlp_rate (`float`, *optional*, defaults to 0.0):
            Dropout rate for the MLP component of SwiftFormer.
        drop_conv_encoder_rate (`float`, *optional*, defaults to 0.0):
            Dropout rate for the ConvEncoder component of SwiftFormer.
        use_layer_scale (`bool`, *optional*, defaults to `True`):
            Whether to scale outputs from token mixers.
        layer_scale_init_value (`float`, *optional*, defaults to 1e-05):
            Factor by which outputs from token mixers are scaled.
        batch_norm_eps (`float`, *optional*, defaults to 1e-05):
            The epsilon used by the batch normalization layers.

    Example:
        ```python
        >>> from transformers import SwiftFormerConfig, SwiftFormerModel
        ...
        >>> # Initializing a SwiftFormer swiftformer-base-patch16-224 style configuration
        >>> configuration = SwiftFormerConfig()
        ...
        >>> # Initializing a model (with random weights) from the swiftformer-base-patch16-224 style configuration
        >>> model = SwiftFormerModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """

    model_type = "swiftformer"

    def __init__(
        self,
        image_size=224,
        num_channels=3,
        depths=[3, 3, 6, 4],
        embed_dims=[48, 56, 112, 220],
        mlp_ratio=4,
        downsamples=[True, True, True, True],
        hidden_act="gelu",
        down_patch_size=3,
        down_stride=2,
        down_pad=1,
        drop_path_rate=0.0,
        drop_mlp_rate=0.0,
        drop_conv_encoder_rate=0.0,
        use_layer_scale=True,
        layer_scale_init_value=1e-5,
        batch_norm_eps=1e-5,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.image_size = image_size
        self.num_channels = num_channels
        self.depths = depths
        self.embed_dims = embed_dims
        self.mlp_ratio = mlp_ratio
        self.downsamples = downsamples
        self.hidden_act = hidden_act
        self.down_patch_size = down_patch_size
        self.down_stride = down_stride
        self.down_pad = down_pad
        self.drop_path_rate = drop_path_rate
        self.drop_mlp_rate = drop_mlp_rate
        self.drop_conv_encoder_rate = drop_conv_encoder_rate
        self.use_layer_scale = use_layer_scale
        self.layer_scale_init_value = layer_scale_init_value
        self.batch_norm_eps = batch_norm_eps



    torch_onnx_minimum_version = version.parse("1.11")

    @property
    def inputs(self) -> Mapping[str, Mapping[int, str]]:
        return OrderedDict(
            [
                ("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
            ]
        )

    @property
    def atol_for_validation(self) -> float:
        return 1e-4

`mindnlp.transformers.models.swiftformer.modeling_swiftformer` ¶

MindSpore SwiftFormer model.

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerConvEncoder` ¶

Bases: Module

SwiftFormerConvEncoder with 3*3 and 1*1 convolutions.

Input: tensor of shape [batch_size, channels, height, width]

Output: tensor of shape [batch_size, channels, height, width]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerConvEncoder(nn.Module):
    """
    `SwiftFormerConvEncoder` with 3*3 and 1*1 convolutions.

    Input: tensor of shape `[batch_size, channels, height, width]`

    Output: tensor of shape `[batch_size, channels, height, width]`
    """

    def __init__(self, config: SwiftFormerConfig, dim: int):
        super().__init__()
        hidden_dim = int(config.mlp_ratio * dim)

        self.depth_wise_conv = nn.Conv2d(dim, dim, kernel_size=3, padding=1, groups=dim)
        self.norm = nn.BatchNorm2d(dim, eps=config.batch_norm_eps)
        self.point_wise_conv1 = nn.Conv2d(dim, hidden_dim, kernel_size=1)
        self.act = nn.GELU()
        self.point_wise_conv2 = nn.Conv2d(hidden_dim, dim, kernel_size=1)
        self.drop_path = nn.Dropout(p=config.drop_conv_encoder_rate)
        self.layer_scale = nn.Parameter(ops.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True)

    def forward(self, x):
        input = x
        x = self.depth_wise_conv(x)
        x = self.norm(x)
        x = self.point_wise_conv1(x)
        x = self.act(x)
        x = self.point_wise_conv2(x)
        x = input + self.drop_path(self.layer_scale * x)
        return x

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerDropPath` ¶

Bases: Module

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerDropPath(nn.Module):
    """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks)."""

    def __init__(self, config: SwiftFormerConfig) -> None:
        super().__init__()
        self.drop_prob = config.drop_path_rate

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        return drop_path(hidden_states, self.drop_prob, self.training)

    def extra_repr(self) -> str:
        return "p={}".format(self.drop_prob)

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEfficientAdditiveAttention` ¶

Bases: Module

Efficient Additive Attention module for SwiftFormer.

Input: tensor of shape [batch_size, channels, height, width]

Output: tensor of shape [batch_size, channels, height, width]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerEfficientAdditiveAttention(nn.Module):
    """
    Efficient Additive Attention module for SwiftFormer.

    Input: tensor of shape `[batch_size, channels, height, width]`

    Output: tensor of shape `[batch_size, channels, height, width]`
    """

    def __init__(self, config: SwiftFormerConfig, dim: int = 512):
        super().__init__()

        self.to_query = nn.Linear(dim, dim)
        self.to_key = nn.Linear(dim, dim)

        self.w_g = nn.Parameter(ops.randn(dim, 1))
        self.scale_factor = dim**-0.5
        self.proj = nn.Linear(dim, dim)
        self.final = nn.Linear(dim, dim)

    def forward(self, x):
        query = self.to_query(x)
        key = self.to_key(x)

        query = nn.functional.normalize(query, dim=-1)
        key = nn.functional.normalize(key, dim=-1)

        query_weight = query @ self.w_g
        scaled_query_weight = query_weight * self.scale_factor
        scaled_query_weight = ops.softmax(scaled_query_weight, dim=-1)

        global_queries = ops.sum(scaled_query_weight * query, dim=1)
        global_queries = global_queries.unsqueeze(1).tile((1, key.shape[1], 1))

        out = self.proj(global_queries * key) + query
        out = self.final(out)

        return out

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEmbeddings` ¶

Bases: Module

Embeddings layer consisting of a single 2D convolutional and batch normalization layer.

Input: tensor of shape [batch_size, channels, height, width]

Output: tensor of shape [batch_size, channels, height/stride, width/stride]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerEmbeddings(nn.Module):
    """
    Embeddings layer consisting of a single 2D convolutional and batch normalization layer.

    Input: tensor of shape `[batch_size, channels, height, width]`

    Output: tensor of shape `[batch_size, channels, height/stride, width/stride]`
    """

    def __init__(self, config: SwiftFormerConfig, index: int):
        super().__init__()

        patch_size = config.down_patch_size
        stride = config.down_stride
        padding = config.down_pad
        embed_dims = config.embed_dims

        in_chans = embed_dims[index]
        embed_dim = embed_dims[index + 1]

        patch_size = patch_size if isinstance(patch_size, collections.abc.Iterable) else (patch_size, patch_size)
        stride = stride if isinstance(stride, collections.abc.Iterable) else (stride, stride)
        padding = padding if isinstance(padding, collections.abc.Iterable) else (padding, padding)

        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride, padding=padding)
        self.norm = nn.BatchNorm2d(embed_dim, eps=config.batch_norm_eps)

    def forward(self, x):
        x = self.proj(x)
        x = self.norm(x)
        return x

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEncoderBlock` ¶

Bases: Module

SwiftFormer Encoder Block for SwiftFormer. It consists of (1) Local representation module, (2) SwiftFormerEfficientAdditiveAttention, and (3) MLP block.

Input: tensor of shape [batch_size, channels, height, width]

Output: tensor of shape [batch_size, channels,height, width]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerEncoderBlock(nn.Module):
    """
    SwiftFormer Encoder Block for SwiftFormer. It consists of (1) Local representation module, (2)
    SwiftFormerEfficientAdditiveAttention, and (3) MLP block.

    Input: tensor of shape `[batch_size, channels, height, width]`

    Output: tensor of shape `[batch_size, channels,height, width]`
    """

    def __init__(self, config: SwiftFormerConfig, dim: int, drop_path: float = 0.0) -> None:
        super().__init__()

        layer_scale_init_value = config.layer_scale_init_value
        use_layer_scale = config.use_layer_scale

        self.local_representation = SwiftFormerLocalRepresentation(config, dim=dim)
        self.attn = SwiftFormerEfficientAdditiveAttention(config, dim=dim)
        self.linear = SwiftFormerMlp(config, in_features=dim)
        self.drop_path = SwiftFormerDropPath(config) if drop_path > 0.0 else nn.Identity()
        self.use_layer_scale = use_layer_scale
        if use_layer_scale:
            self.layer_scale_1 = nn.Parameter(
                layer_scale_init_value * ops.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True
            )
            self.layer_scale_2 = nn.Parameter(
                layer_scale_init_value * ops.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True
            )

    def forward(self, x):
        x = self.local_representation(x)
        batch_size, channels, height, width = x.shape
        res = self.attn(x.permute(0, 2, 3, 1).reshape(batch_size, height * width, channels))
        res = res.reshape(batch_size, height, width, channels).permute(0, 3, 1, 2)
        if self.use_layer_scale:
            x = x + self.drop_path(self.layer_scale_1 * res)
            x = x + self.drop_path(self.layer_scale_2 * self.linear(x))
        else:
            x = x + self.drop_path(res)
            x = x + self.drop_path(self.linear(x))
        return x

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerForImageClassification` ¶

Bases: SwiftFormerPreTrainedModel

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerForImageClassification(SwiftFormerPreTrainedModel):
    def __init__(self, config: SwiftFormerConfig) -> None:
        super().__init__(config)

        embed_dims = config.embed_dims

        self.num_labels = config.num_labels
        self.swiftformer = SwiftFormerModel(config)

        # Classifier head
        self.norm = nn.BatchNorm2d(embed_dims[-1], eps=config.batch_norm_eps)
        self.head = nn.Linear(embed_dims[-1], self.num_labels) if self.num_labels > 0 else nn.Identity()
        self.dist_head = nn.Linear(embed_dims[-1], self.num_labels) if self.num_labels > 0 else nn.Identity()

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[tuple, ImageClassifierOutputWithNoAttention]:
        r"""
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # run base model
        outputs = self.swiftformer(
            pixel_values,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs.last_hidden_state if return_dict else outputs[0]

        # run classification head
        sequence_output = self.norm(sequence_output)
        sequence_output = ops.flatten(sequence_output, 2).mean(-1)
        cls_out = self.head(sequence_output)
        distillation_out = self.dist_head(sequence_output)
        logits = (cls_out + distillation_out) / 2

        # calculate loss
        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return ImageClassifierOutputWithNoAttention(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
        )

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None)` ¶

labels (mindspore.Tensor of shape (batch_size,), optional): Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

def forward(
    self,
    pixel_values: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[tuple, ImageClassifierOutputWithNoAttention]:
    r"""
    labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
        Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
        config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
        `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    # run base model
    outputs = self.swiftformer(
        pixel_values,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs.last_hidden_state if return_dict else outputs[0]

    # run classification head
    sequence_output = self.norm(sequence_output)
    sequence_output = ops.flatten(sequence_output, 2).mean(-1)
    cls_out = self.head(sequence_output)
    distillation_out = self.dist_head(sequence_output)
    logits = (cls_out + distillation_out) / 2

    # calculate loss
    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            loss_fct = MSELoss()
            if self.num_labels == 1:
                loss = loss_fct(logits.squeeze(), labels.squeeze())
            else:
                loss = loss_fct(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss_fct = BCEWithLogitsLoss()
            loss = loss_fct(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return ImageClassifierOutputWithNoAttention(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
    )

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerLocalRepresentation` ¶

Bases: Module

Local Representation module for SwiftFormer that is implemented by 3*3 depth-wise and point-wise convolutions.

Input: tensor of shape [batch_size, channels, height, width]

Output: tensor of shape [batch_size, channels, height, width]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerLocalRepresentation(nn.Module):
    """
    Local Representation module for SwiftFormer that is implemented by 3*3 depth-wise and point-wise convolutions.

    Input: tensor of shape `[batch_size, channels, height, width]`

    Output: tensor of shape `[batch_size, channels, height, width]`
    """

    def __init__(self, config: SwiftFormerConfig, dim: int):
        super().__init__()

        self.depth_wise_conv = nn.Conv2d(dim, dim, kernel_size=3, padding=1, groups=dim)
        self.norm = nn.BatchNorm2d(dim, eps=config.batch_norm_eps)
        self.point_wise_conv1 = nn.Conv2d(dim, dim, kernel_size=1)
        self.act = nn.GELU()
        self.point_wise_conv2 = nn.Conv2d(dim, dim, kernel_size=1)
        self.drop_path = nn.Identity()
        self.layer_scale = nn.Parameter(ops.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True)

    def forward(self, x):
        input = x
        x = self.depth_wise_conv(x)
        x = self.norm(x)
        x = self.point_wise_conv1(x)
        x = self.act(x)
        x = self.point_wise_conv2(x)
        x = input + self.drop_path(self.layer_scale * x)
        return x

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerMlp` ¶

Bases: Module

MLP layer with 1*1 convolutions.

Input: tensor of shape [batch_size, channels, height, width]

Output: tensor of shape [batch_size, channels, height, width]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerMlp(nn.Module):
    """
    MLP layer with 1*1 convolutions.

    Input: tensor of shape `[batch_size, channels, height, width]`

    Output: tensor of shape `[batch_size, channels, height, width]`
    """

    def __init__(self, config: SwiftFormerConfig, in_features: int):
        super().__init__()
        hidden_features = int(in_features * config.mlp_ratio)
        self.norm1 = nn.BatchNorm2d(in_features, eps=config.batch_norm_eps)
        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)
        act_layer = ACT2CLS[config.hidden_act]
        self.act = act_layer()
        self.fc2 = nn.Conv2d(hidden_features, in_features, 1)
        self.drop = nn.Dropout(p=config.drop_mlp_rate)

    def forward(self, x):
        x = self.norm1(x)
        x = self.fc1(x)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerPatchEmbedding` ¶

Bases: Module

Patch Embedding Layer constructed of two 2D convolutional layers.

Input: tensor of shape [batch_size, in_channels, height, width]

Output: tensor of shape [batch_size, out_channels, height/4, width/4]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerPatchEmbedding(nn.Module):
    """
    Patch Embedding Layer constructed of two 2D convolutional layers.

    Input: tensor of shape `[batch_size, in_channels, height, width]`

    Output: tensor of shape `[batch_size, out_channels, height/4, width/4]`
    """

    def __init__(self, config: SwiftFormerConfig):
        super().__init__()

        in_chs = config.num_channels
        out_chs = config.embed_dims[0]
        self.patch_embedding = nn.Sequential(
            nn.Conv2d(in_chs, out_chs // 2, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(out_chs // 2, eps=config.batch_norm_eps),
            nn.ReLU(),
            nn.Conv2d(out_chs // 2, out_chs, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(out_chs, eps=config.batch_norm_eps),
            nn.ReLU(),
        )

    def forward(self, x):
        return self.patch_embedding(x)

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerPreTrainedModel` ¶

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = SwiftFormerConfig
    base_model_prefix = "swiftformer"
    main_input_name = "pixel_values"
    supports_gradient_checkpointing = True
    _no_split_modules = ["SwiftFormerEncoderBlock"]

    def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.BatchNorm2d]) -> None:
        """Initialize the weights"""
        if isinstance(module, (nn.Conv2d, nn.Linear)):
            nn.init.trunc_normal_(module.weight, std=0.02)
            if module.bias is not None:
                nn.init.constant_(module.bias, 0)
        elif isinstance(module, (nn.BatchNorm2d)):
            nn.init.constant_(module.bias, 0)
            nn.init.constant_(module.weight, 1.0)

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerStage` ¶

Bases: Module

A Swiftformer stage consisting of a series of SwiftFormerConvEncoder blocks and a final SwiftFormerEncoderBlock.

Input: tensor in shape [batch_size, channels, height, width]

Output: tensor in shape [batch_size, channels, height, width]

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

class SwiftFormerStage(nn.Module):
    """
    A Swiftformer stage consisting of a series of `SwiftFormerConvEncoder` blocks and a final
    `SwiftFormerEncoderBlock`.

    Input: tensor in shape `[batch_size, channels, height, width]`

    Output: tensor in shape `[batch_size, channels, height, width]`
    """

    def __init__(self, config: SwiftFormerConfig, index: int) -> None:
        super().__init__()

        layer_depths = config.depths
        dim = config.embed_dims[index]
        depth = layer_depths[index]

        blocks = []
        for block_idx in range(depth):
            block_dpr = config.drop_path_rate * (block_idx + sum(layer_depths[:index])) / (sum(layer_depths) - 1)

            if depth - block_idx <= 1:
                blocks.append(SwiftFormerEncoderBlock(config, dim=dim, drop_path=block_dpr))
            else:
                blocks.append(SwiftFormerConvEncoder(config, dim=dim))

        self.blocks = nn.ModuleList(blocks)

    def forward(self, input):
        for block in self.blocks:
            input = block(input)
        return input

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.drop_path(input, drop_prob=0.0, training=False)` ¶

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the argument.

Source code in mindnlp\transformers\models\swiftformer\modeling_swiftformer.py

def drop_path(input: mindspore.Tensor, drop_prob: float = 0.0, training: bool = False) -> mindspore.Tensor:
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

    Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks,
    however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the
    layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the
    argument.
    """
    if drop_prob == 0.0 or not training:
        return input
    keep_prob = 1 - drop_prob
    shape = (input.shape[0],) + (1,) * (input.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + ops.rand(shape, dtype=input.dtype)
    random_tensor = random_tensor.floor()  # binarize
    output = input.div(keep_prob) * random_tensor
    return output

swiftformer

mindnlp.transformers.models.swiftformer.configuration_swiftformer ¶

mindnlp.transformers.models.swiftformer.configuration_swiftformer.SwiftFormerConfig ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerConvEncoder ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerDropPath ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEfficientAdditiveAttention ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEmbeddings ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEncoderBlock ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerForImageClassification ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None) ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerLocalRepresentation ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerMlp ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerPatchEmbedding ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerPreTrainedModel ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerStage ¶

mindnlp.transformers.models.swiftformer.modeling_swiftformer.drop_path(input, drop_prob=0.0, training=False) ¶

`mindnlp.transformers.models.swiftformer.configuration_swiftformer` ¶

`mindnlp.transformers.models.swiftformer.configuration_swiftformer.SwiftFormerConfig` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerConvEncoder` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerDropPath` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEfficientAdditiveAttention` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEmbeddings` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerEncoderBlock` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerForImageClassification` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None)` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerLocalRepresentation` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerMlp` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerPatchEmbedding` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerPreTrainedModel` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.SwiftFormerStage` ¶

`mindnlp.transformers.models.swiftformer.modeling_swiftformer.drop_path(input, drop_prob=0.0, training=False)` ¶