G06F8/751

Merging changes from upstream code to a branch

A computer-implemented method is provided for program repository management. The method includes identifying commits in an upstream commit log of an upstream branch and commits in a development commit log of a development branch. The method further includes extracting the commits in the development commit log of the development branch. The method also includes identifying, by a hardware processor in the upstream commit log, a code which is identical or similar to the extracted commits from the commit log of the development branch. The method additionally includes showing the identified code as a commit candidate of change in an upstream program code.

Similar code analysis and template induction
11137986 · 2021-10-05 · ·

Disclosed herein are system, method, and computer program product embodiments for mitigating similar and/or duplicate data. An embodiment operates by a computing device receiving a first and second code segment and parsing the first and second code segments into a first and second abstract syntax tree (AST), respectively. Thereafter, the computing device generates a first and a second normalized. AST corresponding to the first and second ASTs, respectively, based on a normalization rule such that the first and second normalized ASTs correspond to a first and second normalized code segment. The computing device then derives analytical information of the first and second normalized ASTs based on the first and second normalized code segments.

ASSESSING OPERATIONAL STABILITY OF COMPUTER SYSTEM BY INTEGRITY CHECKING OF COMPUTER PROGRAM
20210256136 · 2021-08-19 ·

A computer-implemented method for checking the integrity of a target computer program to be executed in a computer system.

DETERMINING AN ORIGIN OF A TARGET SOURCE CODE FOR A COMPUTER PROGRAM OR A PART THEREOF
20210256130 · 2021-08-19 ·

A computer-implemented method for determining an origin of a target source code for a computer program or a part thereof. The method involves: Searching a plurality of software archives from different sources in a global computer network to find occurrences of the target source code among code files in said software archives. For every found occurrence of the target source code, the method further involves collecting key information about the matching source code files and, from the key information collected a frequency map is built that contains, for each keyword found in the key information, a keyword count value being indicative of the number of times the keyword occurs in the key information. The method may further involves applying a scoring scheme to the matching source code files based on the built frequency map (310), determining a highest score (Vmax) among the matching source code files after the scoring scheme has been applied, and determining the origin of the target source code as the matching source code file having the highest score.

Source code similarity detection using digital fingerprints
11099842 · 2021-08-24 · ·

Similarity in source code is identified by searching digital fingerprints representing at least control flow of blocks of programming statements. At least some of the source code is converted into a plurality of respective tokens. Each of the tokens is associated with a plurality of blocks. Tokens are modified by normalizing at least one value in at least one of the blocks and/or by defining at least one abstraction. Thereafter, a representation of control flow is created, and a digital fingerprint representing at control flow of a token is generated. Thereafter, source code within at least one block of a given token is determined and identifiable as being a duplicate of source code stored in a repository by comparing at least one of the generated digital fingerprints and at least one previously generated digital fingerprint.

MERGING CHANGES FROM UPSTREAM CODE TO A BRANCH
20210255852 · 2021-08-19 ·

A computer-implemented method is provided for program repository management. The method includes identifying commits in an upstream commit log of an upstream branch and commits in a development commit log of a development branch. The method further includes extracting the commits in the development commit log of the development branch. The method also includes identifying, by a hardware processor in the upstream commit log, a code which is identical or similar to the extracted commits from the commit log of the development branch. The method additionally includes showing the identified code as a commit candidate of change in an upstream program code.

FREQUENT SOURCE CODE PATTERN MINING

A data mining technique is used to find large frequently-occurring source code patterns from methods/APIs that can be used in code development. Simplified trees that represent the syntactic structure and type and method usage of a source code fragment, such as a method, are mined to find closed and maximal frequent subtrees which represent the largest frequently-occurring source code patterns or idioms associated with a particular type and method usage. These idioms are then used in an idiom web service and/or a code completion system to assist users in the development of source code programs.

Measuring code sharing of software modules based on fingerprinting of assembly code

A method includes obtaining assembly code of a first software module, the assembly code comprising one or more assembly functions each comprising at least one basic block. The method also includes computing fingerprints of the basic blocks of the first software module by application of a fuzzy hash function, generating a representation of the first software module as a set of assembly functions each represented as a sequence of fingerprints of its associated basic blocks, and determining a similarity score between the first software module and at least a second software module classified as a given software module type. The similarity score is based on distances between the fingerprints of the basic blocks of the assembly functions of the first software module and corresponding fingerprints of the second software module. The method further includes determining a measure of code sharing between the first and second software modules based on the similarity score.

SYSTEM FOR DISCOVERY AND ANALYSIS OF SOFTWARE DISTRIBUTED ACROSS AN ELECTRONIC NETWORK PLATFORM

Systems, computer program products, and methods are described herein for discovery and analysis of software distributed across an electronic network platform of an entity. The present invention is configured to continuously monitor one or more hardware devices associated with a technology environment; initiate an open source code discovery engine on the one or more hardware devices, wherein initiating further comprises automatically populating a first database with at least the portion of the one or more applications that match the attributes associated with open source code identifiers; and initiate an approval and enforcement engine on at least the portion of the one or more applications stored on the first database.

SIMILAR CODE ANALYSIS AND TEMPLATE INDUCTION
20210182037 · 2021-06-17 ·

Disclosed herein are system, method, and computer program product embodiments for mitigating similar and/or duplicate data. An embodiment operates by a computing device receiving a first and second code segment and parsing the first and second code segments into a first and second abstract syntax tree (AST), respectively. Thereafter, the computing device generates a first and a second normalized AST corresponding to the first and second ASTs, respectively, based on a normalization rule such that the first and second normalized ASTs correspond to a first and second normalized code segment. The computing device then derives analytical information of the first and second normalized ASTs based on the first and second normalized code segments.