G06F40/149

Encoding of data formatted in human readable text according to schema into binary

Data is organized in a hierarchical data tree having nodes, and is formatted in human-readable data according to a schema. The data is canonically ordered in correspondence with a canonical ordering of a schema dictionary generated from the schema. The canonically ordered data is encoded into binary, including for each node, removing a label of the node, and adding a sequence number of the node corresponding to the canonical ordering, in binary.

System and method for processing messages using native data serialization/deserialization in a service-oriented pipeline architecture

A computer-implemented system and method for processing messages using native data serialization/deserialization without any transformation, in a service-oriented pipeline architecture is disclosed. The method in an example embodiment that includes serializing or deserializing the request/response message directly into the format (specific on-the-wire data format or a java object) the recipient expects (either a service implementation or a service consumer or the framework), without first converting into an intermediate format. This provides an efficient mechanism for the same service implementation to be accessed by exchanging messages using different data formats.

DOCUMENT ENCODING
20170357626 · 2017-12-14 · ·

A method may include determining that a character event of an extensible markup language (XML) document is untyped. The character event may be associated with a content string including whitespace. The method may further include determining that the character event is associated with an event code having a length of one. In response to determining that the character event is untyped and is associated with the event code having the length of one, the content string may be encoded as an encoded content string, including encoding the whitespace of the content string such that the whitespace of the content string is preserved.

Providing a binary data file to a client application using a document model
11257040 · 2022-02-22 · ·

A document collaboration system allows applications to collaborate on a binary data file even if the binary data file is not in a collaborative document format. In response to a request from an application to access a binary data file, the document collaboration system gives the application access to a document model corresponding to the binary data file and the application If the document model does not already exist, it may be created by generating an empty document model, transmitting the binary data file to the application, and allowing the application to fill in the empty document model based on the binary data file. The document model may be provided to and modified by its related application through an application programming interface (API), and changes to the document model may be applied to the binary data file.

Providing a binary data file to a client application using a document model
11257040 · 2022-02-22 · ·

A document collaboration system allows applications to collaborate on a binary data file even if the binary data file is not in a collaborative document format. In response to a request from an application to access a binary data file, the document collaboration system gives the application access to a document model corresponding to the binary data file and the application If the document model does not already exist, it may be created by generating an empty document model, transmitting the binary data file to the application, and allowing the application to fill in the empty document model based on the binary data file. The document model may be provided to and modified by its related application through an application programming interface (API), and changes to the document model may be applied to the binary data file.

AUTOMATED DOCUMENT IDENTIFICATION AND LANGUAGE DICTATION RECOGNITION SYSTEMS AND METHODS FOR USING THE SAME
20170294190 · 2017-10-12 ·

In at least one exemplary embodiment for automated document identification and language dictation recognition systems, the system comprises a database capable of receiving a plurality of verbal records, the verbal record comprising at least one identifier and at least one verbal feature and a processor operably coupled to the database, where the processor has and executes a software program. The processor being operational to identify a subset of the plurality of verbal records from the database, extract at least one verbal feature from the identified records, analyze the at least one verbal feature of the subset of the plurality of verbal records, process the subset of the plurality of records using the analyzed feature according to at least one reasoning approach, generate a processed verbal record using the processed subset of the plurality of records, and deliver the processed verbal record to a recipient. The processor being further operational to extract features for a pool of training documents, to turn each transcription job into a feature vector which can be used by a traditional classifier, creating classifiers with different parameters in order to explore the best possible strategy, evaluating performance of all classifiers, creating a boosting classifier, calculating performance statistics, and operating the automatic document identifier for all documents.

COMMON PHRASE IDENTIFICATION AND LANGUAGE DICTATION RECOGNITION SYSTEMS AND METHODS FOR USING THE SAME
20170286393 · 2017-10-05 ·

In at least one exemplary embodiment for common phrase identification and language dictation recognition systems and methods for using the same, the system comprises a database capable of receiving a plurality of verbal records, the verbal record comprising at least one identifier and at least one verbal feature and a processor operably coupled to the database, where the processor has and executes a software program. The processor being operational to identify a subset of the plurality of verbal records from the database, extract at least one verbal feature from the identified records, analyze the at least one verbal feature of the subset of the plurality of verbal records, process the subset of the plurality of records using the analyzed feature according to at least one reasoning approach, generate a processed verbal record using the processed subset of the plurality of records, and deliver the processed verbal record to a recipient. The processor being further operational to identify common phrases in parts of the verbal record, identifying a body of work for building a set of common phrases, analyze documents in a training set to find some common phrases, and replacing phrases with the common phrases.

System and method for building and repairing a script for retrieval of information from a web site
09779007 · 2017-10-03 · ·

A system and method allows users to provide portions of scripts or scripts for retrieval of information from one or more web sites of one or more businesses by demonstrating operation of the one or more web sites and identifying locations of one or more fields on each web page of the one or more web sites, and the system and method stores the scripts and uses the scripts to retrieve information from such web site or web sites for any number of users. Different portions of different scripts may be used as a single script to retrieve information from a single web site. Scripts or portions of scripts may be repaired using information from previously working scripts or portions, the web site when the script or portion worked, and the web site when the script or portion did not work.

GRAMMAR GENERATION
20170249288 · 2017-08-31 · ·

An extensible markup language schema definition (XSD) may be received. The XSD may include multiple elements, each having a complex type definition and an empty content model. A singleton empty content grammar may be generated. The singleton empty content grammar may be shared among the multiple elements. Multiple grammars may be generated based on the XSD. The multiple grammars may be associated with encoding and decoding extensible markup language (XML) documents based on the XSD to and from efficient XML interchange (EXI) streams. Each of the multiple grammars may correspond to an element of the multiple elements. Each of the multiple grammars may include the singleton empty content grammar. A device configured to encode or decode the XML documents to or from the EXI streams commits fewer resources than the device would commit if each of the multiple grammars included a separate content grammar rather than the singleton content grammar.

Streaming contextual unidirectional models

Streaming machine learning unidirectional models is facilitated by the use of embedding vectors. Processing blocks in the models apply embedding vectors as input. The embedding vectors utilize context of future data (e.g., data that is temporally offset into the future within a data stream) to improve the accuracy of the outputs generated by the processing blocks. The embedding vectors cause a temporal shift between the outputs of the processing blocks and the inputs to which the outputs correspond. This temporal shift enables the processing blocks to apply the embedding vector inputs from processing blocks that are associated with future data.