G06V30/18181

Methods and systems for graph-inference-based text extraction from unstructured documents
12597283 · 2026-04-07 · ·

According to one aspect, the subject matter described herein includes a method for extracting text from unstructured documents. The method includes receiving a page of an unstructured document; extracting, from the page, a glyph identifier and a glyph position for each glyph on the page; and generating an adjacency graph based on the glyph positions for each glyph on the page, each node in the graph corresponding to a glyph and comprising glyph information that includes at least the glyph identifier and the glyph position for the respective glyph. The method further includes processing the adjacency graph by a machine learning model to classify edges and nodes in the adjacency graph, then grouping the glyphs according to their edge and node classifications to produce text output.