『计算机视觉』Mask-RCNN_推断网络其六：Mask生成

一、Mask生成概览

上一节的末尾，我们已经获取了待检测图片的分类回归信息，我们将回归信息（即待检测目标的边框信息）单独提取出来，结合金字塔特征mrcnn_feature_maps，进行Mask生成工作（input_image_meta用于提取输入图片长宽，进行金字塔ROI处理，即PyramidROIAlign）。

            # Detections

            # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in

            # normalized coordinates

            detections = DetectionLayer(config, name="mrcnn_detection")(

                [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])

            # Create masks for detections

            detection_boxes = KL.Lambda(lambda x: x[..., :4])(detections)

            mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps,

                                              input_image_meta,

                                              config.MASK_POOL_SIZE,

                                              config.NUM_CLASSES,

                                              train_bn=config.TRAIN_BN)

二、Mask生成函数

我们在『计算机视觉』Mask-RCNN_推断网络其四：FPN和ROIAlign的耦合已经介绍过了PyramidROIAlign class的内容，

def build_fpn_mask_graph(rois, feature_maps, image_meta,

                         pool_size, num_classes, train_bn=True):

    """Builds the computation graph of the mask head of Feature Pyramid Network.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized

          coordinates.

    feature_maps: List of feature maps from different layers of the pyramid,

                  [P2, P3, P4, P5]. Each has a different resolution.

    image_meta: [batch, (meta data)] Image details. See compose_image_meta()

    pool_size: The width of the square feature map generated from ROI Pooling.

    num_classes: number of classes, which determines the depth of the results

    train_bn: Boolean. Train or freeze Batch Norm layers

    Returns: Masks [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]

    """

    # ROI Pooling

    # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]

    x = PyramidROIAlign([pool_size, pool_size],

                        name="roi_align_mask")([rois, image_meta] + feature_maps)

    # Conv layers

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),

                           name="mrcnn_mask_conv1")(x)

    x = KL.TimeDistributed(BatchNorm(),

                           name='mrcnn_mask_bn1')(x, training=train_bn)

    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),

                           name="mrcnn_mask_conv2")(x)

    x = KL.TimeDistributed(BatchNorm(),

                           name='mrcnn_mask_bn2')(x, training=train_bn)

    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),

                           name="mrcnn_mask_conv3")(x)

    x = KL.TimeDistributed(BatchNorm(),

                           name='mrcnn_mask_bn3')(x, training=train_bn)

    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),

                           name="mrcnn_mask_conv4")(x)

    x = KL.TimeDistributed(BatchNorm(),

                           name='mrcnn_mask_bn4')(x, training=train_bn)

    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),

                           name="mrcnn_mask_deconv")(x)

    x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),

                           name="mrcnn_mask")(x)

    return x

PyramidROIAlign之后（这里会降采样一次），最终生成众多宽高等同输入ROI feat，但是是单通道的Mask输出（最后的激活函数很疯狂，relu接sigmoid，保证每个像素位置介于01之间），此时的Mask掩码输出大小为[2*MASK_POOL_SIZE, 2*MASK_POOL_SIZE]，对demo.pynb而言是28*28（源码注释给的数目是没有*2的，由于最后有一个stride为2的转置卷积，所以应该是疏忽，毕竟config的另外一个参量MASK_SHAPE值为28*28，虽然在推断网络没有使用到）。

至此，推断网络的最后一个输出——对象级别原始Mask计算了出来。

三、build函数返回

然后，我们将整个build函数构建model所需要的输入tensor和输出tensor进行打包，创建keras模型，

            model = KM.Model([input_image, input_image_meta, input_anchors],

                             [detections, mrcnn_class, mrcnn_bbox,

                                 mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],

                             name='mask_rcnn')

        # Add multi-GPU support.

        if config.GPU_COUNT > 1:

            from mrcnn.parallel_model import ParallelModel

            model = ParallelModel(model, config.GPU_COUNT)

        return model

整理一下模型输出Tensor：

# num_anchors,    每张图片上生成的锚框数量
# num_rois,       每张图片上由锚框筛选出的推荐区数量，
# #               由 POST_NMS_ROIS_TRAINING 或 POST_NMS_ROIS_INFERENCE 规定
# num_detections, 每张图片上最终检测输出框，
# #               由 DETECTION_MAX_INSTANCES 规定

# detections,     [batch, num_detections, (y1, x1, y2, x2, class_id, score)]
# mrcnn_class,    [batch, num_rois, NUM_CLASSES] classifier probabilities
# mrcnn_bbox,     [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
# mrcnn_mask,     [batch, num_detections, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]
# rpn_rois,       [batch, num_rois, (y1, x1, y2, x2, class_id, score)]
# rpn_class,      [batch, num_anchors, 2]
# rpn_bbox        [batch, num_anchors, 4]

由于我们的GPU_COUNT为1，不是多卡训练，所以不需要GPU支持（多GPU模型后面应该会单开一节讲），直接将model返回即可。

附、MaskRCNN class网络构建方法总览

再次将build函数全貌贴出，分支选项按照mode='inference'选即为本系列推断网络的内容。

############################################################

#  MaskRCNN Class

############################################################

class MaskRCNN():

    """Encapsulates the Mask RCNN model functionality.

    The actual Keras model is in the keras_model property.

    """

    def __init__(self, mode, config, model_dir):

        """

        mode: Either "training" or "inference"

        config: A Sub-class of the Config class

        model_dir: Directory to save training logs and trained weights

        """

        assert mode in ['training', 'inference']

        self.mode = mode

        self.config = config

        self.model_dir = model_dir

        self.set_log_dir()

        self.keras_model = self.build(mode=mode, config=config)

    def build(self, mode, config):

        """Build Mask R-CNN architecture.

            input_shape: The shape of the input image.

            mode: Either "training" or "inference". The inputs and

                outputs of the model differ accordingly.

        """

        assert mode in ['training', 'inference']

        # Image size must be dividable by 2 multiple times

        h, w = config.IMAGE_SHAPE[:2]  # [1024 1024 3]

        if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6):  # 这里就限定了下采样不会产生坐标误差

            raise Exception("Image size must be dividable by 2 at least 6 times "

                            "to avoid fractions when downscaling and upscaling."

                            "For example, use 256, 320, 384, 448, 512, ... etc. ")

        # Inputs

        input_image = KL.Input(

            shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")

        input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE],

                                    name="input_image_meta")

        if mode == "training":

            # RPN GT

            input_rpn_match = KL.Input(

                shape=[None, 1], name="input_rpn_match", dtype=tf.int32)

            input_rpn_bbox = KL.Input(

                shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)

            # Detection GT (class IDs, bounding boxes, and masks)

            # 1. GT Class IDs (zero padded)

            input_gt_class_ids = KL.Input(

                shape=[None], name="input_gt_class_ids", dtype=tf.int32)

            # 2. GT Boxes in pixels (zero padded)

            # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates

            input_gt_boxes = KL.Input(

                shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)

            # Normalize coordinates

            gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(

                x, K.shape(input_image)[1:3]))(input_gt_boxes)

            # 3. GT Masks (zero padded)

            # [batch, height, width, MAX_GT_INSTANCES]

            if config.USE_MINI_MASK:

                input_gt_masks = KL.Input(

                    shape=[config.MINI_MASK_SHAPE[0],

                           config.MINI_MASK_SHAPE[1], None],

                    name="input_gt_masks", dtype=bool)

            else:

                input_gt_masks = KL.Input(

                    shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],

                    name="input_gt_masks", dtype=bool)

        elif mode == "inference":

            # Anchors in normalized coordinates

            input_anchors = KL.Input(shape=[None, 4], name="input_anchors")

        # Build the shared convolutional layers.

        # Bottom-up Layers

        # Returns a list of the last layers of each stage, 5 in total.

        # Don't create the thead (stage 5), so we pick the 4th item in the list.

        if callable(config.BACKBONE):

            _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,

                                                train_bn=config.TRAIN_BN)

        else:

            _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,

                                             stage5=True, train_bn=config.TRAIN_BN)

        # Top-down Layers

        # TODO: add assert to varify feature map sizes match what's in config

        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)  # 256

        P4 = KL.Add(name="fpn_p4add")([

            KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),

            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])

        P3 = KL.Add(name="fpn_p3add")([

            KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),

            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])

        P2 = KL.Add(name="fpn_p2add")([

            KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),

            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])

        # Attach 3x3 conv to all P layers to get the final feature maps.

        P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)

        P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)

        P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)

        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)

        # P6 is used for the 5th anchor scale in RPN. Generated by

        # subsampling from P5 with stride of 2.

        P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

        # Note that P6 is used in RPN, but not in the classifier heads.

        rpn_feature_maps = [P2, P3, P4, P5, P6]

        mrcnn_feature_maps = [P2, P3, P4, P5]

        # Anchors

        if mode == "training":

            anchors = self.get_anchors(config.IMAGE_SHAPE)

            # Duplicate across the batch dimension because Keras requires it

            # TODO: can this be optimized to avoid duplicating the anchors?

            anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)

            # A hack to get around Keras's bad support for constants

            anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)

        else:

            anchors = input_anchors

        # RPN Model, 返回的是keras的Module对象, 注意keras中的Module对象是可call的

        rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,  # 1 3 256

                              len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)

        # Loop through pyramid layers

        layer_outputs = []  # list of lists

        for p in rpn_feature_maps:

            layer_outputs.append(rpn([p]))  # 保存各pyramid特征经过RPN之后的结果

        # Concatenate layer outputs

        # Convert from list of lists of level outputs to list of lists

        # of outputs across levels.

        # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]

        output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]

        outputs = list(zip(*layer_outputs))  # [[logits2,……6], [class2,……6], [bbox2,……6]]

        outputs = [KL.Concatenate(axis=1, name=n)(list(o))

                   for o, n in zip(outputs, output_names)]

        # [batch, num_anchors, 2/4]

        # 其中num_anchors指的是全部特征层上的anchors总数

        rpn_class_logits, rpn_class, rpn_bbox = outputs

        # Generate proposals

        # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates

        # and zero padded.

        # POST_NMS_ROIS_INFERENCE = 1000

        # POST_NMS_ROIS_TRAINING = 2000

        proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\

            else config.POST_NMS_ROIS_INFERENCE

        # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)]

        # IMAGES_PER_GPU取代了batch，之后说的batch都是IMAGES_PER_GPU

        rpn_rois = ProposalLayer(

            proposal_count=proposal_count,

            nms_threshold=config.RPN_NMS_THRESHOLD,  # 0.7

            name="ROI",

            config=config)([rpn_class, rpn_bbox, anchors])

        if mode == "training":

            # Class ID mask to mark class IDs supported by the dataset the image

            # came from.

            active_class_ids = KL.Lambda(

                lambda x: parse_image_meta_graph(x)["active_class_ids"]

                )(input_image_meta)

            if not config.USE_RPN_ROIS:

                # Ignore predicted ROIs and use ROIs provided as an input.

                input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],

                                      name="input_roi", dtype=np.int32)

                # Normalize coordinates

                target_rois = KL.Lambda(lambda x: norm_boxes_graph(

                    x, K.shape(input_image)[1:3]))(input_rois)

            else:

                target_rois = rpn_rois

            # Generate detection targets

            # Subsamples proposals and generates target outputs for training

            # Note that proposal class IDs, gt_boxes, and gt_masks are zero

            # padded. Equally, returned rois and targets are zero padded.

            rois, target_class_ids, target_bbox, target_mask =\

                DetectionTargetLayer(config, name="proposal_targets")([

                    target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])

            # Network Heads

            # TODO: verify that this handles zero padded ROIs

            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\

                fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,

                                     config.POOL_SIZE, config.NUM_CLASSES,

                                     train_bn=config.TRAIN_BN,

                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

            mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,

                                              input_image_meta,

                                              config.MASK_POOL_SIZE,

                                              config.NUM_CLASSES,

                                              train_bn=config.TRAIN_BN)

            # TODO: clean up (use tf.identify if necessary)

            output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)

            # Losses

            rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(

                [input_rpn_match, rpn_class_logits])

            rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(

                [input_rpn_bbox, input_rpn_match, rpn_bbox])

            class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(

                [target_class_ids, mrcnn_class_logits, active_class_ids])

            bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(

                [target_bbox, target_class_ids, mrcnn_bbox])

            mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(

                [target_mask, target_class_ids, mrcnn_mask])

            # Model

            inputs = [input_image, input_image_meta,

                      input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]

            if not config.USE_RPN_ROIS:

                inputs.append(input_rois)

            outputs = [rpn_class_logits, rpn_class, rpn_bbox,

                       mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,

                       rpn_rois, output_rois,

                       rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]

            model = KM.Model(inputs, outputs, name='mask_rcnn')

        else:

            # Network Heads

            # Proposal classifier and BBox regressor heads

            # output shapes:

            #     mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)

            #     mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities

            #     mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]

            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\

                fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta,

                                     config.POOL_SIZE,  # 7

                                     config.NUM_CLASSES,

                                     train_bn=config.TRAIN_BN,

                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

            # Detections

            # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in

            # normalized coordinates

            detections = DetectionLayer(config, name="mrcnn_detection")(

                [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])

            # Create masks for detections

            detection_boxes = KL.Lambda(lambda x: x[..., :4])(detections)

            mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps,

                                              input_image_meta,

                                              config.MASK_POOL_SIZE,  # 14

                                              config.NUM_CLASSES,

                                              train_bn=config.TRAIN_BN)

            # num_anchors,    每张图片上生成的锚框数量

            # num_rois,       每张图片上由锚框筛选出的推荐区数量，

            # #               由 POST_NMS_ROIS_TRAINING 或 POST_NMS_ROIS_INFERENCE 规定

            # num_detections, 每张图片上最终检测输出框，

            # #               由 DETECTION_MAX_INSTANCES 规定

            # detections,     [batch, num_detections, (y1, x1, y2, x2, class_id, score)]

            # mrcnn_class,    [batch, num_rois, NUM_CLASSES] classifier probabilities

            # mrcnn_bbox,     [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]

            # mrcnn_mask,     [batch, num_detections, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]

            # rpn_rois,       [batch, num_rois, (y1, x1, y2, x2, class_id, score)]

            # rpn_class,      [batch, num_anchors, 2]

            # rpn_bbox        [batch, num_anchors, 4]

            model = KM.Model([input_image, input_image_meta, input_anchors],

                             [detections, mrcnn_class, mrcnn_bbox,

                                 mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],

                             name='mask_rcnn')

        # Add multi-GPU support.

        if config.GPU_COUNT > 1:

            from mrcnn.parallel_model import ParallelModel

            model = ParallelModel(model, config.GPU_COUNT)

        return model

巴特西

『计算机视觉』Mask-RCNN_推断网络其六：Mask生成

一、Mask生成概览

二、Mask生成函数

三、build函数返回

附、MaskRCNN class网络构建方法总览

最新文章

热门文章