G06V30/244

Optically analyzing text strings such as domain names

Systems and methods determine whether domain names are potentially maliciously registered variants of a set of monitored domain names. A computer system can receive domain names from a feed of newly registered domain names. For each received domain name, the computer system can generate a series of images of the domain name in different fonts and/or with various distortions applied thereto. The computer system can then transform the domain name images back to text via optical character recognition. Due to the differences in fonts and/or distortions applied to the generated images of the received domain name, the optical character recognition process can produce different text strings than the originally received domain name. The converted textual domain names are then analyzed to determine whether any one is sufficiently similar to a monitored domain name, indicating that the received domain name could be a malicious variant thereof.

Identifying matching fonts utilizing deep learning
11763583 · 2023-09-19 · ·

The present disclosure relates to systems, methods, and non-transitory computer readable media for generating and providing matching fonts by utilizing a glyph-based machine learning model. For example, the disclosed systems can generate a glyph image by arranging glyphs from a digital document according to an ordering rule. The disclosed systems can further identify target fonts as fonts that include the glyphs within the glyph image. The disclosed systems can further generate target glyph images by arranging glyphs of the target fonts according to the ordering rule. Based on the glyph image and the target glyph images, the disclosed systems can utilize a glyph-based machine learning model to generate and compare glyph image feature vectors. By comparing a glyph image feature vector with a target glyph image feature vector, the font matching system can identify one or more matching glyphs.

Data processing systems, devices, and methods for content analysis
11232251 · 2022-01-25 ·

Systems, devices and methods operative for identifying a reference within a figure and an identifier in a text associated with the figure, the reference referring to an element depicted in the figure, the reference corresponding to the identifier, the identifier identifying the element in the text, placing the identifier on the figure at a distance from the reference, the identifier visually associated with the reference upon the placing, the placing of the identifier on the figure is irrespective of the distance between the identifier and the reference.

Preserving Document Design Using Font Synthesis
20230326104 · 2023-10-12 · ·

Automatic font synthesis for modifying a local font to have an appearance that is visually similar to a source font is described. A font modification system receives an electronic document including the source font together with an indication of a font descriptor for the source font. The font descriptor includes information describing various font attributes for the source font, which define a visual appearance of the source font. Using the source font descriptor, the font modification system identifies a local font that is visually similar in appearance to the source font by comparing local font descriptors to the source font descriptor. A visually similar font is then synthesized by modifying glyph outlines of the local font to achieve the visual appearance defined by the source font descriptor. The synthesized font is then used to replace the source font and output in the electronic document at the computing device.

Font identification from imagery

A system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.

Method of generating font database, and method of training neural network model

A method of generating a font database, and a method of training a neural network model are provided, which relate to a field of artificial intelligence, in particular to a computer vision and deep learning technology. The method of generating the font database includes: determining, by using a trained similarity comparison model, a basic font database most similar to handwriting font data of a target user in a plurality of basic font databases as a candidate font database; and adjusting, by using a trained basic font database model for generating the candidate font database, the handwriting font data of the target user, so as to obtain a target font database for the target user.

FONT ATTRIBUTE DETECTION
20230343124 · 2023-10-26 ·

Described are techniques for font attribute detection. The techniques include receiving a document having different font attributes amongst a plurality of words respectively comprised of at least one character. The techniques further include generating a dense image document from the document by setting the plurality of words to a predefined size, removing blank spaces from the document, and altering an order of characters relative to the document. The techniques further include determining characteristics of the characters in the dense image document and aggregating the characteristics for at least one word. The techniques further include annotating the at least one word with a font attribute based on the aggregated characteristics.

Image processing method for an identity document

An image processing method, for an identity document that comprises a data page, comprises acquiring a digital image of the page of data of the identity document. The method further comprises assigning a class or a super-class to the candidate identity document via automatic classification of the digital image by a machine-learning algorithm trained beforehand on a set of reference images in a training phase; processing the digital image to obtain a set of at least one intermediate image the weight of which is lower than or equal to the weight of the digital image; applying discrimination to the intermediate image using a discriminator neural network; and generating an output signal as output from the discriminator neural network, the value of which is representative of the probability that the candidate identity document is an authentic document or a fake.

Handwritten content removing method and device and storage medium

A handwritten content removing method and device and a storage medium. The handwritten content removing method comprises: acquiring an input image of a text page to be processed, the input image comprising a handwritten region, which comprises a handwritten content (S10); identifying the input image so as to determine the handwritten content in the handwritten region (S11); and removing the handwritten content in the input image so as to obtain an output image (S12).

Data processing systems, devices, and methods for content analysis
11830266 · 2023-11-28 ·

Systems, devices and methods operative for identifying a reference within a figure and an identifier in a text associated with the figure, the reference referring to an element depicted in the figure, the reference corresponding to the identifier, the identifier identifying the element in the text, placing the identifier on the figure at a distance from the reference, the identifier visually associated with the reference upon the placing, the placing of the identifier on the figure is irrespective of the distance between the identifier and the reference.