Cross-modal image-watermark joint generation and detection device and method thereof
12125119 ยท 2024-10-22
Assignee
Inventors
- Anan Liu (Tianjin, CN)
- Guokai Zhang (Tianjin, CN)
- Lanjun Wang (Tianjin, CN)
- Ning Xu (Tianjin, CN)
- Yuting Su (Tianjin, CN)
- Yongdong Zhang (Tianjin, CN)
Cpc classification
G06T1/0028
PHYSICS
G06T2207/20016
PHYSICS
Y02T10/40
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06T1/0064
PHYSICS
International classification
Abstract
The present disclosure discloses a cross-modal image-watermark joint generation and detection device and method thereof. The device includes a multimodal encoder, an image-watermark feature co-embedding module, an image-watermark feature fusion module, an up-sampling generator, a non-cooperative game decoupling module configured to decouple an unwatermarked image and a reconstructed watermark from a composite image through two decoders by developing allocation strategies according to a non-cooperative game theory and a Shannon information theory; a strategy allocation module configured to set an composite image discriminator, keeping the consistency between the composite image and input text by multi-specification down-sampling convolution kernels and set the objective functions to constrain watermark reconstruction and unwatermarked image decoding; and a post-processing attack module configured to simulate various attacks for ensuring the robustness of watermarks.
Claims
1. A cross-modal image-watermark joint generation and detection device, comprising: an image-watermark feature co-embedding module, configured to map an original image feature and a watermark feature to a unified feature space by a learnable parameter matrix; an image-watermark feature fusion module, configured to fuse the watermark feature and the original image feature at a channel level to acquire an image-watermark fusion feature and cascade the original image feature for a plurality of times; an up-sampling generator, configured to map the image-watermark fusion feature into pixels to acquire a composite image with a preset resolution; a non-cooperative game decoupling module, configured to allocate information of the composite image through two decoders by developing allocation strategies according to a non-cooperative game theory and a Shannon information theory to decouple an unwatermarked image and a reconstructed watermark; a strategy allocation module, configured to set an image joint discriminator, extract features of the composite image by multi-specification down-sampling convolution kernels to constrain image-text semantic consistency and fidelity, and set an objective function to constrain reconstruction of watermark and unwatermarked image; and a post-processing attack module, configured to simulate post-processing attacks and output a final image-watermark joint generated image; wherein, the original image feature is obtained by a multimodal encoder, the multimodal encoder configured to extract features from an input text, noise sampling and a digital watermark by pre-trained natural language encoding models, multilayer perceptrons, and visual encoding models, and acquire feature representations thereof to obtain the original image feature through affine transformation using text features and noise features.
2. The cross-modal image-watermark joint generation and detection device according to claim 1, wherein the device further comprises an image and watermark joint generation evaluation module, configured to evaluate image quality, watermark invisibility, watermark reconstruction quality and watermark robustness.
3. The cross-modal image-watermark joint generation and detection device according to claim 1, wherein the image-watermark feature co-embedding module is as follows:
f.sub.t,w=.sub.c(T.sub.tM.sub.t,T.sub.wM.sub.w) where, f.sub.t,w is an image and watermark splicing feature, .sub.c(.Math.) represents a channel-level splicing operation, M.sub.t represents the original image feature and M.sub.w represents the watermark feature, T.sub.t and T.sub.w are learnable corresponding dimension parameter matrices.
4. The cross-modal image-watermark joint generation and detection device according to claim 3, wherein the image-watermark feature fusion module is as follows:
5. The cross-modal image-watermark joint generation and detection device according to claim 3, wherein the non-cooperative game decoupling module comprises an image decoding unit and a watermark reconstruction unit, the image decoding unit being expressed as:
6. A cross-modal image-watermark joint generation and detection method, comprising: mapping an original image feature and a watermark feature to a unified feature space by a learnable parameter matrix; fusing the watermark feature and the original image feature at a channel level to acquire an image-watermark fusion feature and cascading the original image feature for a plurality of times; mapping the image-watermark fusion feature into pixels to acquire a composite image with a preset resolution; allocating information of the composite image through two decoders by developing allocation strategies according to a non-cooperative game theory and a Shannon information theory to decouple an unwatermarked image and a reconstructed watermark; setting an image joint discriminator, extracting features of the composite image by multi-specification down-sampling convolution kernels to constrain image-text semantic consistency and fidelity, and setting an objective function to constrain reconstruction of watermark and unwatermarked image; and simulating post-processing attacks and outputting a final image-watermark joint generated image; wherein, the original image feature is obtained by a multimodal encoder, the multimodal encoder configured to extract features from an input text, noise sampling and a digital watermark by pre-trained natural language encoding models, multilayer perceptrons, and visual encoding models, and acquire feature representations thereof to obtain the original image feature through affine transformation using text features and noise features.
7. A non-transitory computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program comprises program instructions; and when the program instructions are executed by the processor, the method according to claim 6 is implemented by the processor.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7) Wherein, in
DETAILED DESCRIPTION OF THE PRESENT DISCLOSURE
(8) To make the objectives, technical solutions and advantages of the present disclosure clearer, the implementations of the present disclosure will be further described below in detail.
Embodiment 1
(9) A cross-modal image-watermark joint generation and detection device, referring to
(10) I. A multimodal encoder, configured to extract features from an input text, noise sampling and a digital watermark by pre-trained natural language encoding models, multilayer perceptrons, and visual encoding models, and acquire feature representations thereof to obtain an original image feature through affine transformation using text features and noise features.
(11) Specifically, the multimodal encoder further includes:
(12) 1) a BiLSTM (bidirectional long short-term memory) context-aware encoding unit, configured to sequentially encode word embedding features by pre-trained long short-term memory networks, such that the text features have context-aware information, and the sentence-level embedding representation is obtained.
(13) The text features are sequentially fed into BiLSTM for bidirectional coding, and a hidden state s.sup.L.sup.
(14) 2) A multilayer perceptron noise coding unit, configured to map noise obtained by random sampling in a standard Gaussian distribution into a feature vector by multilayer perceptron networks to increase the variety of generated images.
(15) The noise obtained by random sampling in the standard Gaussian distribution N(0,1) is fed into an MLP network and mapped into the feature vector z.sup.L.sup.
(16) 3) A watermark generation unit, configured to map creation-related metadata into a single-channel binary watermark embedded into an image in a hidden manner, by which the purpose of traceability is achieved.
(17) Specifically, text, creative time, creator ID and other factors are entered into single-channel binary watermark pixels in a character form, wherein the creator ID is set to be 8 digits, with each digit being sampled from uniform distribution U(0,9).
(18) 4) A multilayer convolutional network watermark feature extraction unit, configured to extract binary watermark features by convolutional neural networks to obtain spatial-level feature representation.
(19) Wherein, the single-channel binary watermark is fed into multilayer convolutional neural networks (CNN) to obtain the watermark features M.sub.w.sup.HW1. 4 layers of the multilayer convolutional neural networks are set, and numbers of output channels are set to 3, 6, 12 and 1 respectively; LeakyReLu activation functions are set among the layers, dimensions of a receptive field are set to 33, and a convolution step size is set to 1.
(20) 5) An image feature initialization unit, configured to generate image initial features by calculation through affine transformation based on noise sampling and text input.
(21) In order to enrich the visual effect of the generated image, affine transformation is introduced to fuse noise and text features, and a specific implementation process thereof may be expressed as:
Affine(z,s)=.sub.scale(s).Math.(z)+.sub.shift(s)(1)
(22) Where, Affine(.Math.) represents an affine transformation function, .sub.scale(.Math.), .sub.shift(.Math.) and (.Math.) represent translation, scaling and noise mapping functions. An output matrix of the affine transformation is expressed as M.sub.t.sup.HWC, which is regarded as an original image feature, wherein H, W, C represent a height and a width, and channels of a feature matrix respectively, and
represents a feature space.
(23) II. An image-watermark feature co-embedding module, configured to map an original image feature and a watermark feature to a unified feature space by a learnable parameter matrix to achieve the compatibility of the original image feature and the watermark feature.
(24) Specifically, in order to improve the representation ability of the image and optimize the invisibility of the watermark, it is necessary to find a feature co-embedding space for integrating the original image feature M.sub.t and the watermark feature M.sub.w. Learnable corresponding dimension parameter matrices T.sub.t and T.sub.w are initialized randomly, and the watermark feature and the image feature are enabled to be compatible in a training process, and a specific implementation process thereof is expressed as:
f.sub.t,w=.sub.c(T.sub.tM.sub.t,T.sub.wM.sub.w)(2)
(25) Where, f.sub.t,w represents an image and watermark splicing feature, and .sub.c(.Math.) represents a channel-level splicing operation, which aims to express the watermark and the image in the same feature space.
(26) III. an image-watermark feature fusion module, configured to fuse the watermark feature and the original image feature at a channel level to achieve the effects of hiding watermark signals and highlighting a high-quality image visual effect by cascading the original image feature for many times.
(27) Specifically, the splicing feature f.sub.t,w is compressed by a Unet network to obtain low-level key information, and correlation mining is conducted on different scale features by skip connection to learn multi-scale watermark and image information. Furthermore, in order to reduce watermark interference to the image feature, the original image feature M.sub.t is fused for many times, and a watermark signal ratio is reduced. The specific implementation process may be expressed as:
(28)
(29) Where, Y.sub.i(.Math.) is an ith-layer feature fusion module, which mainly consists of a full connected network (FCN) with a parameter of .sub.i and a nearest neighbor interpolation algorithm. In the embodiment of the present disclosure, the number of layers is set to 3. M.sub.i.sup.c is a composite visual feature map output from an ith layer. E.sub.Unet(.Math.) is a Unet-based encoder configured to couple the visual feature to the watermark feature. In the embodiment of the present disclosure, even in the case that the watermark interference to the image quality is minor, the watermark information can also be protected from being lost.
(30) IV. An up-sampling generator, configured to map watermark image fusion features into pixels, semantic information and watermark signals are included therein, and finally a composite image with a resolution of 256256 is obtained.
(31) In order to generate the composite image with a hidden watermark by using the fusion feature M.sub.i.sup.c, it is processed by the up-sampling generator, and the specific implementation process may be expressed as:
x.sub.c=F.sub.w(M.sub.i.sup.c;.sub.c)(4)
(32) Where, F.sub.w(.Math.) represents an up-sampling generation function of a parameter .sub.c. x.sub.c is the composite image with a resolution of 256256, which shows an excellent visual effect and fully hides the watermark information.
(33) V. A non-cooperative game decoupling module, configured to allocate information of the composite image through two decoders by developing allocation strategies according to a non-cooperative game theory and a Shannon information theory to decouple an unwatermarked image and a reconstructed watermark.
(34) Specifically, a non-cooperative game features that there is a lack of communication and negotiation among game participants who need to develop their own dominant strategies. Concretely speaking, the participants place a strong emphasis on independent decision making to maximize their own interests, the decision making is independent of strategies adopted by other participants in strategic environment, and the ultimate objective is to achieve balance among game players.
(35) For the game G=(s.sub.1, s.sub.2, . . . , s.sub.n; p, p.sub.2, . . . , p.sub.n), supposing that (s.sub.1, s.sub.2, . . . , s.sub.n) is an arbitrary strategy combination, when meeting the strategy (s.sub.1, . . . , s.sub.i1, s.sub.i+1, . . . , s.sub.n) of other participants, the strategy s.sub.i* is the optimum selection of the participant p.sub.i. The non-cooperative game is formulated as:
(36)
(37) Where, * represents a set of Nash equilibrium strategies, that is, no participants may increase gains by changing their own strategies alone. M.sub.gain.sup.P.sup.
c.sub.x.sub.
(38) Where, (.Math.) reflects allocation strategies of two contributors, the value range is [0,1], and represents a positive correlation. Theoretically, the watermark and the image participate in a non-cooperative game, and strive to achieve a Nash equilibrium state. Supposing that c.sub.x.sub.
(39)
(40) Where, *(.Math.) represents an optimal allocation strategy. Formula (8) is simplified:
(41)
(42) Where, C* is a constant. The allocation strategies are dependent on c.sub.w.sub.
(43)
(44) Where, s.sub.x.sub.
(45) Specifically, the non-cooperative game decoupling module further includes:
(46) 1) an image decoding unit, configured to decouple the unwatermarked image from the composite image.
(47) The embodiment of the present disclosure designs a cooperative decoupling method for a watermark and an image. Ideally, an unwatermarked image is expected to retain visual information equivalent to that of the composite image, to reduce the disparity therebetween. Furthermore, watermark signals should not be stored in the unwatermarked image. According to the Shannon information theory, this process aims to reduce the difference between the unwatermarked image and the composite image, and meanwhile expand a gap between the image with hidden watermark and the unwatermarked image. The specific implementation process of this strategy may be expressed as:
(48)
(49) Where, MI(.Math.) represents a mutual information function calculated by Kullback-Leibler divergence and used for optimizing a parameter .sub.x, s.sub.x.sub.
(50) In order to achieve the Formula (11), the composite image is firstly processed by the decoder, and the specific implementation process may be expressed as:
x.sub.r=R.sub.r(x.sub.c;x)(12)
(51) Where, R.sub.r(.Math.) is a Unet-based image decoder, configured to establish a pixel-level dependency between the composite image and an analytic image.
(52) 2) A watermark reconstruction unit, configured to reconstruct an approximately lossless high-quality watermark from the composite image.
(53) Specifically, the unwatermarked image x.sub.r is almost dependent of information of the reconstructed watermark w.sub.r ideally, while the composite image x.sub.c and the reconstructed watermark w.sub.r share hidden information. Thus, in the feature space, the self-information of the composite image is set to I(x.sub.c), and the mutual information thereof to the unwatermarked image is set to MI(x.sub.c,x.sub.r), while I(w.sub.r) aims to search the feature space of a supplementary set C.sub.I(x.sub.
(54)
(55) Where, s.sub.w.sub.
w.sub.r=R.sub.w(x.sub.c;.sub.w)(14)
(56) Where, R.sub.w(.Math.) is a Unet-based watermark decoder, and thus, under the independent strategy and the decoder, the image and the watermark may be cooperatively decoupled while approaching the Nash equilibrium state.
(57) VI. A strategy allocation module, configured to set objective functions of the composite image, the unwatermarked image and reconstructed watermark, namely set an image discriminator, extract features of the composite image by multi-specification down-sampling convolution kernels to constrain image-text semantic consistency and fidelity, and meanwhile set an objective function to constrain the reconstruction of watermark and unwatermarked image.
(58) Specifically, the strategy allocation module further includes:
(59) 1) a composite image discrimination strategy, configured to set the discriminator to constrain the composite image.
(60) In the embodiment of the present disclosure, it is not only required to generate the composite image x.sub.c, but also the unwatermarked image x.sub.r is decoupled by the specific Unet decoder. Therefore, the discriminator is required to ensure the authenticity of the image. For the composite image, the initialized image feature M.sub.t is used for guiding the semantic expression of x.sub.c. The objective function of the composite image discriminator is defined as:
(61)
(62) Where, corresponds to mismatched text description, x represents a real image, and p.sub.r and p.sub.x.sub.
L.sub.x.sub.
(63) Where, (.Math.) measures similarity between x.sub.c and the initial image feature M.sub.t by using MSE-L2 loss. .sub.1 is a proportional coefficient.
(64) 2) An unwatermarked image decoding strategy, configured to split the unwatermarked image from the composite image.
(65) As a supplement of target L.sub.x.sub.
L.sub.x.sub.
(66) Where, .sub.2 is a proportional coefficient. The objective function is used for eliminating the watermark while maintaining a similar visual appearance with the composite image by the smooth L/loss.
(67) 3) A reconstructed watermark strategy, configured to reconstruct watermark signal from the composite image.
(68) In the embodiment of the present disclosure, a powerful constraint (.Math.) is introduced to ensure the completeness of w.sub.r, and the objective function for restoring the watermark is as follows:
L.sub.w.sub.
(69) Where, .sub.3 is a proportional coefficient, and the objective function for keeping the reconstructed watermark consistent with the hidden watermark.
(70) VII. A post-process attack module, configured to simulate post-processing attacks of random cropping, space rotation, Gaussian noise, impulse noise, Gaussian blur, and brightness adjustment, such that the watermark adapts to robustness against common attacks.
(71) In real scenarios where post-processing attacks such as Gaussian noise, space rotation and random cropping may occur, the watermark should have strong robustness to protect information from being lost, and therefore the ultimate purpose of traceability and protection is achieved. In the embodiment of the present disclosure, the post-processing attacks are simulated in a training process, such that the encoder-decoder is more adaptive to an attack pattern. Specifically, after the module is disposed on the up-sampling generator, the post-processing attacks with different intensities are added to the composite image x.sub.c, which aims to train robust decoder parameters. In the training process, the generator parameters are fixed to ensure that the generation process of the composite image is not affected. The image x.sub.c is fed into the watermark decoder after being attacked, and finally a high-quality reconstructed watermark is obtained. The solution shows significant robustness against the post-processing attacks, and information storage is maintained in a reasonable and identifiable range.
(72) VIII. An image and watermark joint generation evaluation system, a set of specific evaluation indicators is provided to evaluate the image quality, watermark invisibility, watermark reconstruction quality, watermark robustness and the like.
(73) The embodiment of the present disclosure provides a set of evaluation modules suitable for image and watermark joint generation. The modules are configured to evaluate the image quality (namely IS (inception score) and FID (frechet inception distance)), watermark invisibility (namely PSNR (peak signal-to-noise ratio), SSIM (structural similarity index measure) and LPIPS (learned perceptual image patch similarity), watermark reconstruction quality (namely NC and CA), watermark robustness (namely NC and CA) and the like. As a supplementation for existing watermark space evaluation indicators NC, the embodiment of the present disclosure designs an indicator for measuring the character accuracy (CA) of the reconstructed watermark. The indicator is calculated by optical character recognition (OCR) and edit distance. Through calculation of the indicators of NC and CA, after simulation of post-processing attacks (such as rotation, cropping, Gaussian noise and impulse noise), it is proved that character data in the reconstructed watermark in the solution provided by the embodiment of the present disclosure may still be maintained and restored.
Embodiment 2
(74) The embodiment of the present disclosure provides a cross-modal image-watermark joint generation and detection method, and as shown in
(75) Specifically, a set of comprehensive evaluation systems is applied to quantify the image and watermark joint generation effects, that is, measurement is conducted in terms of image quality (namely IS and FID), watermark invisibility (namely PSNR, SSIM and LPIPS), watermark reconstruction quality (namely NC and CA), watermark robustness (namely NC and CA) and the like. The embodiment of the present disclosure achieves excellent performance indicators.
(76) To sum up, the digital watermark is embedded into a text-to-image process in the embodiment of the present disclosure, and influence on the visual effect of the generated image is reduced under the condition of the invisible watermark as far as possible; supervisory and traceability means are provided for visual generative artificial intelligence; and the security and reliability of the generated image are guaranteed. In the embodiment of the present disclosure, the information distribution strategy of the composite image can be developed by the non-cooperative game and Shannon information theory to achieve decoupling of the watermark and the image under quality trade-off. In the embodiment of the present disclosure, under the condition that the post-processing attacks are applied to the composite image, the watermark with a higher recognition degree Can still be reconstructed, which proves that the watermark technology provided by the embodiment of the present disclosure has robustness. The embodiment of the present disclosure is applicable for a method based on generative adversarial networks, and ensures that the generated image has the hidden watermark. The present disclosure has strong generalization. The embodiment of the present disclosure provides a set of evaluation systems combining text-to-image generation and the watermarking, which can evaluate the image quality, watermark invisibility, watermark reconstruction degree and watermark robustness. The above-mentioned technology provides a reliable technical support for the supervision and traceability of the generated image.
Embodiment 3
(77) The solutions in Embodiments 1 and 2 are validated for feasibility below in combination with specific calculating examples and experimental data, and the detailed description is given as follows:
(78) Table 1 to Table 3 are quantifiable results of an image and watermark joint generation device and method for visually generated content security. In the embodiment of the present disclosure, text-to-image models are selected from two paradigms of single-stage generation and multistage generation respectively, namely RAT-GAN (recurrent affine transformation-generative adversarial network) and AttnGAN (attentional generative adversarial network), to validate the generalization of the image and watermark joint generation provided.
(79) Table 1 listed three images for fidelity comparison: (1) an original image: synthesized by a baseline model; (2) a composite image: refers to an image with a hidden watermark acquired from the generator of the present disclosure; (3) an analytic image: is an unwatermarked image version decoupled from the composite image. In the embodiment of the present disclosure, IS and FID indicators are used for evaluating the fidelity of the image. Ideally, although minor pixel-level interference exists, the composite image, the unwatermarked image and the original image should show almost the same visual appearance, and also have slight performance fluctuation in quantitative indicators.
(80) TABLE-US-00001 TABLE 1 Oxford-102 MS- CUB-Birds Flowers COCO Model Image type IS FID IS FID FID RAT- Original 5.36 0.20 13.91 4.09 0.06 16.04 14.60 GAN image Composite 4.94 0.06 16.95 3.72 0.07 18.35 15.62 image Un- 4.98 0.06 17.32 3.81 0.07 19.16 15.28 watermarked image AttnGAN Original 4.36 0.03 23.98 35.49 image Composite 4.02 0.05 26.49 37.51 image Un- 4.09 0.05 26.01 38.29 watermarked image
(81) The invisibility of the hidden watermarks should be embodied such that when the watermarks reach the Nash equilibrium state, they are imperceptible to a human visual system, and the composite image should not leak information obviously. The similarity between the composite image and the analytic image is measured by PSNR. As shown in Table 2, on a CUB-Birds dataset, PSNR values of RAT-GAN and AttnGAN are 33.29 dB and 33.86 dB respectively, while on Oxford-102 Flowers and MS-COCO datasets, an equivalent PSNR is obtained. When the PSNR is 30 dB, an embedded signal may be regarded as having high invisibility, and the embodiment of the present disclosure exceeds the threshold obviously. Therefore, the PSNR verifies the height invisibility of various watermarks in the embodiment of the present disclosure. Human perceptual preferences are simulated from three perspectives, and further evaluation is conducted by SSIM; the SSIM of two models maintains a matching rate of more than 99%, which proves the compatibility of the hidden watermark and a restored image and the invisibility of the watermark. Finally, in order to focus on the intrinsic structure of the image feature, the LPIPS model is used for learning the perceived distance of the composite image and the unwatermarked image. The model is evaluated by depth features. The RAT-GAN and the AttnGAN reach 0.0219 and 0.0235 on MS-COCO, which are less than 0.0320 reached under the condition that the real image is processed by an existing watermark embedding method. This indicates that a secret watermark hidden in the composite image is almost imperceptible. Therefore, almost traceless watermark information hiding is achieved in the embodiment of the present disclosure.
(82) TABLE-US-00002 TABLE 2 CUB-Birds Oxford-102 Flowers MS-COCO Model PSNR(dB) SSIM(%) LPIPS PSNR(dB) SSIM(%) LPIPS PSNR(dB) SSIM(%) LPIPS RAT-GAN 33.29 98.46 0.0257 33.51 98.60 0.0231 33.97 98.77 0.0219 AttnGAN 33.86 98.15 0.0223 33.54 98.26 0.0235
(83) Table 3 listed the degree of watermark reconstruction evaluated from the perspective of space and characters in the absence of attacks. The spatial similarity is measured by NC in a pixel-by-pixel manner, and this indicates that there are fewer distorted pixels. The similarity between the reconstructed watermark and the hidden watermark exceeds 99%. Therefore, the embodiment of the present disclosure achieves extremely high level of reconstruction from the spatial perspective. The embodiment of the present disclosure then provides a CA (Character Accuracy) indicator, by which the character accuracy is measured semantically in combination with OCR and an edit distance. It can be observed that an average CA is less than 0.17, which shows that almost all characters are recognizable in the absence of attacks.
(84) TABLE-US-00003 TABLE 3 CUB-Birds Oxford-102 Flowers MS-COCO Model NC(%) CA NC(%) CA NC(%) CA RAT-GAN 99.75 0.21 99.69 0.19 99.81 0.19 AttnGAN 99.72 0.23 99.48 0.21
(85)
(86)
(87) To sum up, the embodiment of the present disclosure proves the characteristics of high quality of the generated image, high invisibility of the embedded watermark, high reconstruction accuracy of the watermark, strong robustness of the watermark against post-processing attacks and the like by a set of comprehensive evaluation systems, and the technical requirements of text-to-image and watermark joint generation can be met fully. The embodiment of the present disclosure aims to enable supervision of the visual generative model, support traceability of the generated image, and guarantee the security and reliability of the generated visual content.
Embodiment 4
(88) An image-watermark joint generation and detection device includes a processor and a memory, wherein program instructions are stored in the memory, and the processor calls the program instructions stored in the memory to enable the device to implement the following steps of a method: achieving a feature compatibility of an image and a watermark by an image-watermark feature co-embedding matrix; fusing features of the watermark and the image by image-watermark feature fusion at a channel level; synthesizing a high-resolution composite image with an invisible watermark by an up-sampling generator; decoupling an unwatermarked image and a reconstructed watermark based on a non-cooperative game theory; constraining the composite image, the unwatermarked image and the reconstructed watermark by strategy allocation; attacking the composite image by post-processing attacks; and judging embedding and analysis effects of the image and the watermark.
(89) It should be noted here that the description of the device in the above embodiment corresponds to that of the method in the embodiment, which is not repeated in the embodiment of the present disclosure.
(90) An executive body of the processor and the memory may be a computer, a single-chip microcomputer, a microcontroller and other devices with calculating functions. The executive body is not limited to the embodiment of the present disclosure during specific implementation, which is selected according to requirements in actual application.
(91) Data signals are transmitted between the memory and the processor through a bus, which is not repeated in the embodiment of the present disclosure.
(92) Based on the same inventive concept, an embodiment of the present disclosure further provides a computer-readable storage medium including a stored program, and when running, the program controls equipment where the storage medium is located to implement the steps of the method in the above embodiment.
(93) The computer-readable storage medium includes but is not limited to a flash memory, a hard disk, a solid state disk and the like.
(94) It should be noted here that the description of the readable storage medium in the above embodiment corresponds to that of the method in the embodiment, which is not repeated in the embodiment of the present disclosure.
(95) In the above embodiment, the implementation may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When achieved by the software, it may be achieved in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, flows or functions of the embodiment of the present disclosure are generated in whole or in part.
(96) The computer may be a general-purpose computer, a special-purpose computer, a computer network or other programmable devices. The computer instructions may be stored in the computer-readable storage medium or transmitted through the computer-readable storage medium. The computer-readable storage medium may be any available medium capable of being accessed by the computer or data storage equipment such as a server and a data center integrated with one or more available media. The available medium may be a magnetic medium or a semiconductor medium or else.
(97) The embodiments of the present disclosure do not limit models of other devices except for those specifically stated, as long as the devices can complete the above functions.
(98) Those skilled in the art can understand that the drawing is only a schematic diagram of a preferred embodiment. The serial numbers of the above embodiments of the present disclosure are merely for description, and do not represent the advantages and disadvantages of the embodiments.
(99) The above descriptions are merely preferred embodiments of the present disclosure, which are not intended to limit the present disclosure. Any modification, equivalent replacement and improvement made within the spirit and principle of the present disclosure should fall within the protection scope of the present disclosure.