Circuit, Chip, and Electronic Device
20230236727 · 2023-07-27
Inventors
Cpc classification
International classification
Abstract
This application provides a circuit, a chip, and an electronic device. The circuit includes a first processor and a first processing module connected to the first processor. The first processing module includes a second processor connected to a first memory. A transmission latency generated when the second processor performs read and write operations on the first memory is less than a transmission latency generated when the first processor communicates with the first processing module. Because the transmission latency generated when the second processor performs the read and write operations on the first memory is less than the transmission latency generated when the first processor communicates with the first processing module, a cost of a transmission latency of data in a bus can be reduced.
Claims
1. A circuit, wherein the circuit comprises a first processor and a first processing module connected to the first processor, the first processing module comprises a second processor connected to a first memory, and a transmission latency generated when the second processor performs read and write operations on the first memory is less than a transmission latency generated when the first processor communicates with the first processing module.
2. The circuit according to claim 1, wherein the second processor is a multi-core processor, and the transmission latency generated when the second processor performs the read and write operations on the first memory is a transmission latency generated when any core processor of the multi-core processor comprised in the second processor performs read and write operations on the first memory.
3. The circuit according to claim 1, wherein the first processor is connected to the first processing module through a first bus, and the second processor is connected to the first memory through a second bus, wherein a bus bit width of the second bus is greater than a bus bit width of the first bus, and/or a length of the second bus is less than a length of the first bus.
4. The circuit according to claim 1, wherein the first processing module further comprises a third processor connected to a second memory, and a transmission latency generated when the third processor performs read and write operations on the second memory is less than the transmission latency generated when the first processor communicates with the first processing module.
5. The circuit according to claim 4, wherein the first processor is connected to the first processing module through a first bus, the second processor is connected to the first memory through a second bus, the third processor is connected to the second memory through a third bus, and a sum of a bus bit width of the second bus and a bus bit width of the third bus is greater than a bus bit width of the first bus.
6. The circuit according to claim 1, wherein the first processing module further comprises a third processor connected to the first memory, and a transmission latency generated when the third processor performs read and write operations on the first memory is less than the transmission latency generated when the first processor communicates with the first processing module.
7. The circuit according to claim 6, wherein the first processor is connected to the first processing module through a first bus, the second processor is connected to the first memory through a second bus, the third processor is connected to the first memory through a third bus, and a sum of a bus bit width of the second bus and a bus width of the third bus is greater than a bus bit width of the first bus.
8. The circuit according to claim 4, wherein the second processor and the third processor are pipeline processors.
9. The circuit according to claim 1, wherein the circuit further comprises a fourth processor and a third memory connected to the fourth processor; or the circuit further comprises a fourth processor and a second processing module connected to the fourth processor; the second processing module comprises N fifth processors connected to M memories, wherein both N and M are integers greater than or equal to 1; and a transmission latency generated when any fifth processor performs read and write operations on the memory connected to the fifth processor is less than a transmission latency generated when the fourth processor communicates with the second processing module.
10. The circuit according to claim 4, wherein the circuit further comprises a fourth processor and a third memory connected to the fourth processor; or the circuit further comprises a fourth processor and a second processing module connected to the fourth processor; the second processing module comprises N fifth processors connected to M memories, wherein both N and M are integers greater than or equal to 1; and a transmission latency generated when any fifth processor performs read and write operations on the memory connected to the fifth processor is less than a transmission latency generated when the fourth processor communicates with the second processing module.
11. The circuit according to claim 10, wherein the second processor is connected to the third processor through a fourth bus, the fourth processor is connected to the first processor through a fifth bus, and a bus bit width of the fourth bus is less than a bus bit width of the fifth bus.
12. The circuit according to claim 9, wherein a quantity of processor cores comprised in the fourth processor is greater than or equal to a quantity of processor cores comprised in the first processor.
13. The circuit according to claim 9, wherein the fourth processor and the first processor are pipeline processors.
14. The circuit according to claim 1, wherein the first processing module further comprises the first memory.
15. A chip, wherein the chip comprises the circuit according to claim 1.
16. An electronic device, wherein the electronic device comprises the chip according to claim 15, and the electronic device further comprises a receiver and a transmitter, wherein the receiver is configured to receive a packet and send the packet to the chip; the chip is configured to process the packet; and the transmitter is configured to: obtain a packet processed by the chip, and send the processed packet to another electronic device.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0035] The following describes technical solutions of this application with reference to accompanying drawings.
[0036] All aspects, embodiments, or features are presented in this application by describing a system that may include a plurality of devices, components, modules, and the like. It should be appreciated and understood that, each system may include another device, component, module, and the like, and/or may not include all devices, components, modules, and the like discussed with reference to the accompanying drawings. A combination of these solutions may also be used.
[0037] In addition, in embodiments of this application, terms such as “example” and “for example” are used to represent giving an example, an illustration, or a description. Any embodiment or design scheme described as an “example” in this application should not be explained as being more preferred or having more advantages than another embodiment or design scheme. Specifically, the term “example” is used to present a concept in a specific manner.
[0038] In embodiments of this application, “corresponding (corresponding, relevant)” and “corresponding” may be interchangeably used sometimes. It should be noted that meanings expressed by the terms are consistent when differences are not emphasized.
[0039] In embodiments of this application, sometimes a subscript such as W.sub.1 may be written in an incorrect form such as W1. Expressed meanings are consistent when differences are not emphasized.
[0040] Network architectures and service scenarios described in embodiments of this application are intended to describe the technical solutions in embodiments of this application more clearly, and do not constitute any limitation on the technical solutions according to embodiments of this application.
[0041] A person of ordinary skill in the art may learn that the technical solutions according to embodiments of this application are also applicable to a similar technical problem as a network architecture evolves and a new service scenario emerges.
[0042] Reference to “an embodiment”, “some embodiments”, or the like described in this specification indicates that one or more embodiments of this application include a specific feature, structure, or characteristic described with reference to the embodiments. Therefore, statements such as “in an embodiment”, “in some embodiments”, “in some other embodiments”, and “in other embodiments” that appear at different places in this specification do not necessarily mean referring to a same embodiment. Instead, the statements mean “one or more but not all of embodiments”, unless otherwise specifically emphasized in another manner. Terms “include”, “have”, and their variants all mean “include but are not limited to”, unless otherwise specifically emphasized in another manner.
[0043] In this application, “at least one” means one or more, and “a plurality of” means two or more. The term “and/or” describes an association relationship between associated objects and represents that three relationships may exist. For example, A and/or B may represent the following three cases: Only A exists, both A and B exist, and only B exists. A and B each may be singular or plural. The character “/” generally represents an “or” relationship between the associated objects. “At least one of the following items (pieces)” or a similar expression thereof refers to any combination of these items, including any combination of singular items (pieces) or plural items (pieces). For example, at least one item (piece) of a, b, or c may indicate: a, b, c, a and b, a and c, b and c, or a, b, and c, where a, b, and c may be singular or plural.
[0044]
[0045] The chip 100 processes a received packet in a pipeline manner. As shown in
[0046] As shown in
[0047] Based on different connected objects, the bus in the chip 100 includes a type 1 bus, a type 2 bus, a type 3 bus, a type 4 bus, a type 5 bus, and a type 6 bus. The type 1 bus is configured to connect the type 2 processor and a processing module corresponding to the type 2 processor. For example, the bus 11 configured to connect the processor 112 and the processing module 121 and the bus 12 configured to connect the processor 114 and the processing module 122 are both type 1 buses. The type 2 bus is configured to connect two type 3 processors. For example, the bus 21 configured to connect the processor 1211 and the processor 1212, the bus 22 configured to connect the processor 1213 and the processor 1214, and the bus 23 configured to connect the processor 1214 and the processor 1215 are all type 2 buses. The type 3 bus is configured to connect the type 3 processor and a memory corresponding to the type 3 processor. For example, the bus 31 configured to connect the processor 1211 and the memory 1221, the bus 33 configured to connect the processor 1213 and the memory 1223, and the like are all type 3 buses. The type 4 bus is configured to connect two processors in the first pipeline. For example, the bus 41 configured to connect the processor 111 and the processor 112, the bus 42 configured to connect the processor 112 and the processor 113, and the bus 43 configured to connect the processor 113 and the processor 114 are all type 4 buses. The type 5 bus is configured to connect the type 1 processor and a memory corresponding to the type 1 processor. For example, the bus 51 configured to connect the processor 111 and the memory 131 and the bus 52 configured to connect the processor 113 and the memory 132 are both type 5 buses. The type 6 bus is configured to connect the input/output interface 101 and a processor. For example, the bus 61 configured to connect the input/output interface 101 and the processor 111 and the bus 62 configured to connect the processor 114 and the input/output interface 101 are both type 6 buses.
[0048] In some embodiments, each processor in the first pipeline is a multi-core processor. Each processor in the first pipeline may include a plurality of processor cores (which may also be referred to as cores). In some embodiments, different processors in the first pipeline may include a same quantity of processor cores. In other words, any two processors in the first pipeline include a same quantity of processor cores.
[0049] In other embodiments, some processors in the first pipeline include a same quantity of processor cores. For example, a quantity of processor cores included in the processor 111 is equal to a quantity of processor cores included in the processor 113, and a quantity of processor cores included in the processor 112 is equal to a quantity of processor cores included in the processor 114, but the quantity of processor cores included in the processor 111 is different from the quantity of processor cores included in the processor 112. As described above, based on different connected objects, processors in the first pipeline may be classified into two types: a type 1 processor (for example, the processor 111 and the processor 113) and a type 2 processor (for example, the processor 112 and the processor 114). In some embodiments, processors of a same type include a same quantity of processor cores, and processors of different types may include different quantities of processor cores. In some embodiments, a quantity of processor cores included in the type 1 processor may be greater than a quantity of processor cores in the type 2 processor. The type 2 processor communicates with a processing module, and a processor included in the processing module can perform some processing operations. In this way, the type 2 processor may be a single-core processor or a processor with a small quantity of cores, so that hardware costs can be further reduced. For example, the quantity of processor cores included in the type 2 processor may be ½, ⅓, ⅕, or ⅛ of the quantity of processor cores included in the type 1 processor.
[0050] In other embodiments, the type 1 processor may be a multi-core processor, and the type 2 processor may be a single-core processor. In some embodiments, the type 3 processor may also be a multi-core processor. In other words, the type 3 processor may also include a plurality of processor cores. In some embodiments, a quantity of processor cores included in the type 3 processor is less than a quantity of processor cores included in the type 1 processor or a quantity of processor cores included in the type 2 processor. In other words, the quantity of processor cores included in the type 1 processor and the quantity of processor cores included in the type 2 processor are both greater than the quantity of processor cores included in the type 3 processor. For example, a quantity of processor cores included in the processor 1211 may be less than the quantity of processor cores included in the processor 111, and the quantity of processor cores included in the processor 1211 may also be less than the quantity of processor cores included in the processor 112. In other embodiments, a quantity of processor cores included in the type 3 processor may be less than a quantity of processor cores included in the type 1 processor, and the quantity of processor cores included in the type 3 processor may be equal to or greater than a quantity of processor cores included in the type 2 processor. For example, a quantity of processor cores included in the processor 1213 may be less than the quantity of processor cores included in the processor 111, and the quantity of processor cores included in the processor 1213 may be equal to or greater than the quantity of processor cores included in the processor 114. For example, in some embodiments, the quantity of processor cores included in the type 3 processor may be less than or equal to 1/10 of the quantity of processor cores included in the type 1 processor. For another example, in other embodiments, the quantity of processor cores included in the type 3 processor may be less than or equal to ½, ⅓, ⅕, ⅛, or the like of the quantity of processor cores included in the type 1 processor.
[0051] In other embodiments, a sum of a quantity of processor cores included in the type 2 processor and a quantity of processor cores included in one type 3 processor in a processing module corresponding to the processor is equal to a quantity of processor cores included in the type 1 processor. For example, a sum of the quantity of processor cores included in the processor 112 and a quantity of processor cores included in the processor 1212 is equal to the quantity of processor cores included in the processor 111. For another example, a sum of the quantity of processor cores included in the processor 114 and a quantity of processor cores included in the processor 1214 is equal to the quantity of processor cores included in the processor 113. In some embodiments, different type 3 processors may include a same quantity of processor cores. For example, the quantity of processor cores included in the processor 1211 is equal to the quantity of processor cores included in the processor 1212, and the quantity of processor cores included in the processor 1212 is equal to a quantity of processor cores included in the processor 1215.
[0052] In other embodiments, different type 3 processors may include different quantities of processor cores.
[0053] In other embodiments, any two processors belonging to a same processing module include a same quantity of processor cores, and two processors belonging to different processing modules include different quantities of processor cores. For example, the quantity of processor cores included in the processor 1211 is equal to the quantity of processor cores included in the processor 1212, and the quantity of processor cores included in the processor 1212 is not equal to the quantity of processor cores included in the processor 1213. In the chip shown in
[0054] In some embodiments, the type 2 processor may also be a single-core processor. If the type 2 processor is a single-core processor, a processing module including the processor may include at least two processors. In other words, if the processing module includes a plurality of processors, the plurality of processors may include at least one single-core processor. The processing module 121 is used as an example. The processor 1211 in the processing module 121 may be a single-core processor, and the processor 1212 may be a single-core processor or a multi-core processor.
[0055] In some embodiments, a length of the type 1 bus is greater than a length of the type 3 bus. For example, the length of the type 3 bus may be equal to ⅕, ⅛, 1/10, or the like of the length of the type 1 bus. For another example, the length of the type 3 bus may be less than 1/10, 1/15, 1/20, or the like of the length of the type 1 bus. In some embodiments, a sum of the length of the type 1 bus and the length of the type 3 bus is equal to a length of the type 5 bus.
[0056] In some embodiments, any two type 1 buses may have a same length. In some embodiments, any two type 2 buses may have a same length. In some embodiments, any two type 3 buses may have a same length. In some embodiments, any two type 4 buses may have a same length. In some embodiments, any two type 5 buses may have a same length. Due to limitations of a manufacturing process, it may be difficult to obtain buses of a completely same length. Therefore, in this embodiment of this application, that lengths are the same may be understood as that the lengths are completely the same, or may be understood as that a length difference is within an allowed error range. For example, that a sum of the length of the type 1 bus and the length of the type 3 bus is equal to a length of the type 5 bus may be understood as that a difference between the sum of the length of the type 1 bus and the length of the type 3 bus and the length of the type 5 bus is 0, or is less than or equal to a preset allowed error value. For another example, a difference between a length of the bus 51 and a length of the bus 52 (that is, lengths of two type 5 buses) is 0, or is less than or equal to a preset allowed error value.
[0057] In some embodiments, a sum of widths of all third buses in a same processing module is greater than a width of one first bus. For example, a sum of a width of the bus 31 and a width of the bus 32 is greater than a width of the bus 11. For another example, a sum of a width of the bus 33, a width of the bus 34, and a width of the bus 35 is greater than a width of the bus 12. A quantity of bits of binary data that can be simultaneously transmitted through the bus is referred to as a width (which may also be referred to as a bit width), and the width is measured in bits. A greater bus width indicates better transmission performance and a larger amount of data that can be transmitted within a same period. A formula for calculating a bus bandwidth (a total amount of data that can be transmitted per unit time) is as follows: Bus bandwidth=Frequency×Width (bytes/sec).
[0058] In some embodiments, a width of the type 2 bus may be less than a width of the type 4 bus.
[0059] The processor 211 is connected to the input/output interface 201 through a bus 2461. The processor 211 is connected to the processor 212 through a bus 2441. The processor 212 is connected to the processing module 221 through a bus 2411. The processor 212 is connected to the processor 213 through a bus 2442. The processor 213 is connected to the input/output interface 201 through a bus 2462. The processor 213 is connected to the processor 214 through a bus 2443. The processor 214 is connected to the processing module 222 through a bus 2412. The processing module 221 is connected to the input/output interface 201 through a bus 2431. The processing module 222 is connected to the input/output interface 201 through a bus 2433. The processor 2221 is connected to the processor 2212 through a bus 2421. The processor 2213 is connected to the processor 2214 through a bus 2422. The processor 2214 is connected to the processor 2215 through a bus 2423. A memory 231 to a memory 237 are memories located outside the chip 200. The chip 200 may access the memory 231 to the memory 237 through the input/output interface 201 and corresponding buses. Specifically, the memory 231 is connected to the chip 200 through a bus 2471, the memory 232 is connected to the chip 200 through a bus 2472, the memory 233 is connected to the chip 200 through a bus 2473, the memory 234 is connected to the chip 200 through a bus 2474, the memory 235 is connected to the chip 200 through a bus 2475, the memory 236 is connected to the chip 200 through a bus 2476, and the memory 237 is connected to the chip 200 through a bus 2477.
[0060] The chip 200 processes a received packet in a pipeline manner. As shown in
[0061] A plurality of processors included in each processing module may also belong to a single pipeline. For example, the processor 2211 and the processor 2212 belong to a same pipeline, and the processor 2213, the processor 2214, and the processor 2215 belong to a same pipeline. For ease of description, a pipeline in a processing module may be referred to as a second pipeline. A processor in the processing module may be referred to as a type 3 processor. To be specific, the processor 2211, the processor 2212, the processor 2213, the processor 2214, and the processor 2215 shown in
[0062] Each type 1 processor and each processor have one corresponding memory. The processor may read data stored in the corresponding memory. The processor may also write the data into the corresponding memory. In
[0063] A processing module connected to a processor through a bus may be referred to as a processing module corresponding to the processor. For example, the processing module 121 is a processing module corresponding to the processor 112.
[0064] Based on different connected objects, the bus in the chip 200 may include a type 1 bus, a type 2 bus, a type 3 bus, a type 4 bus, a type 5 bus, and a type 6 bus. The type 1 bus is configured to connect the type 2 processor and a processing module corresponding to the type 2 processor. For example, the bus 2411 configured to connect the processor 212 and the processing module 221 and the bus 2412 configured to connect the processor 214 and the processing module 222 are both type 1 buses. The type 2 bus is configured to connect two type 3 processors. For example, the bus 2421 configured to connect the processor 2221 and the processor 2212, the bus 2422 configured to connect the processor 2213 and the processor 2214, and the bus 2423 configured to connect the processor 2214 and the processor 2215 are all type 2 buses. The type 3 bus is configured to connect a processor in a processing module and the input/output interface. For example, the bus 2431, the bus 2432, the bus 2433, the bus 2434, and the bus 2435 are all type 3 buses. The bus 2431 is a type 3 bus configured to connect the processor 2211 and the input/output interface 201. The bus 2432 is a type 3 bus configured to connect the processor 2212 and the input/output interface 201. The bus 2433 is a type 3 bus configured to connect the processor 2213 and the input/output interface 201. The bus 2434 is a type 3 bus configured to connect the processor 2214 and the input/output interface 201. The bus 2435 is a type 3 bus configured to connect the processor 2215 and the input/output interface 201. The type 4 bus is configured to connect two processors in the first pipeline. For example, the bus 2441 configured to connect the processor 211 and the processor 212, the bus 2442 configured to connect the processor 212 and the processor 213, and the bus 2443 configured to connect the processor 213 and the processor 214 are all type 4 buses. The type 6 bus is configured to connect the type 1 processor and the input/output interface. For example, the bus 2461 and the bus 2462 are both type 6 buses.
[0065] In addition to the buses in the chip 200, the chip 200 is further connected to memories through buses. The bus 2471 to the bus 2477 are buses configured to connect the chip 200 and the memories, and the bus may be referred to as a type 7 bus. The type 1 processor may access a corresponding memory through corresponding buses and the input/output interface. For example, the processor 211 may access the memory 231 through the bus 2461, the input/output interface 201, and the bus 2471. For another example, the processor 213 may access the memory 234 through the bus 2462, the input/output interface 201, and the bus 2474. The type 3 processor may access a corresponding memory through corresponding buses and the input/output interface. For example, the processor 2211 may access the memory 232 through the bus 2431, the input/output interface 201, and the bus 2472. For another example, the processor 2215 may access the memory 237 through the bus 2435, the input/output interface 201, and the bus 2477.
[0066] Similar to the chip shown in
[0067] In some embodiments, a length of the type 1 bus is greater than a length of the type 3 bus. For example, the length of the type 1 bus may be equal to ⅕, ⅛, 1/10, or the like of the length of the type 3 bus. For another example, the length of the type 3 bus may be less than 1/10, 1/15, 1/20, or the like of the length of the type 1 bus. In some embodiments, a sum of the length of the type 1 bus and the length of the type 3 bus is equal to a length of the type 6 bus. In some embodiments, any two type 1 buses may have a same length. In some embodiments, any two type 2 buses may have a same length. In some embodiments, any two type 3 buses may have a same length. In some embodiments, any two type 4 buses may have a same length. In some embodiments, any two type 6 buses may have a same length.
[0068] In some embodiments, a sum of widths of buses between the input/output interface and processors corresponding to a same processing module is greater than a width of one first bus. For example, a sum of a width of the bus 2431 and a width of the bus 2432 is greater than a width of the bus 2411. For another example, a sum of a width of the bus 2433, a width of the bus 2434, and a width of the bus 2435 is greater than a width of the bus 2412. In some embodiments, a width of the type 2 bus may be less than a width of the type 4 bus. In the embodiment shown in
[0069] In the embodiment shown in
[0070] For ease of description, the structure shown in
[0071] For ease of description, it is assumed that a length of the bus 333, a length of the bus 334, and a length of the bus 335 are the same. A letter L is used to represent the length of the bus 333, and a letter R is used to represent a length of the bus 331. As described in the foregoing embodiments, in some embodiments, L is less than R. In other embodiments, L may be far less than R. For example, L may be equal to one tenth of R, or L is less than one tenth of R. It is assumed that a letter A is used to represent a width of the bus 333, a letter B is used to represent a width of the bus 334, a letter C is used to represent a width of the bus 335, and a letter D is used to represent a width of the bus 331. In this case, A, B, C, and D meet the following relationship: D<A+B+C. In this way, a data transfer cost of the hybrid processor structure shown in
Cost_TX=L×(A+B+C)+R×D (formula 3.1)
[0072] Cost_TX represents the data transfer cost, L represents the length of the bus 333 (the length of the bus 333, the length of the bus 334, and the length of the bus 335 are equal), R represents the length of the bus 331, A represents the width of the bus 333, B represents the width of the bus 334, C represents the width of the bus 335, and D represents the width of the bus 331.
[0073] After receiving a packet, a chip (for example, the chip loo shown in
[0074] Since the size of the second PS is less than the size of the first PS, a simpler processor may be used to process the second PS. Therefore, a structure of the processor (namely, the type 3 processor) inside the processing module may be simpler than a structure of the processor in the first pipeline. To be specific, a quantity of processor cores included in the type 3 processor may be less than a quantity of processor cores included in the type 1 processor and/or a quantity of processor cores included in the type 2 processor, and/or a quantity of transistors included in the type 3 processor may be less than a quantity of transistors included in the type 1 processor and/or a quantity of transistors included in the type 2 processor. A greater difference between the size of the first PS and the size of the second PS indicates a simpler structure of the type 3 processor.
[0075] In some embodiments, the quantity of processor cores included in the type 3 processor may be less than the quantity of processor cores included in the type 1 processor, and/or the quantity of transistors included in the type 3 processor may be less than the quantity of transistors included in the type 1 processor. In other embodiments, the quantity of processor cores included in the type 3 processor may be less than the quantity of processor cores included in the type 2 processor, and/or the quantity of transistors included in the type 3 processor may be less than the quantity of transistors included in the type 2 processor.
[0076] A quantity of processor cores is used as an example. N_Little may be used to represent the quantity of processor cores included in the type 3 processor, N_Big2 may be used to represent the quantity of processor cores included in the type 2 processor, and N_Big1 may be used to represent the quantity of processor cores included in the type 1 processor.
[0077] In this way, a processor cost of the hybrid processor structure shown in
Cost_Proc=PS_Little_Size×N_Little+PS_Full_Size×N_Big2 (formula 3.2)
[0078] Cost_Proc represents the processor cost, and meanings of PS_Little_Size, N_Little, PS_Full_Size, and N_Big2 are described above. For brevity, details are not described herein again.
[0079] In some embodiments, N_Little, N_Big2, and N_Big1 may meet the following relationship: N_Big1=N_Little+N_Big2.
[0080] If Latency_L is used to represent an input/output (I/O) latency of a bus whose length is L, and Latency_R is used to represent an I/O latency of a bus whose length is R, a latency cost generated when using the hybrid processor structure shown in
Cost_LAT=Latency_L×3+Latency_R×1 (formula 3.3)
[0081] Cost_LAT represents the latency cost, Latency_L represents the I/O latency of the bus whose length is L, and Latency_R represents the I/O latency of the bus whose length is R.
[0082] If processors in one pipeline implement functions implemented by the hybrid processing structure shown in
[0083]
[0084] The bus 421, the bus 422, and the bus 423 have a same length. The length of the bus 421 may be equal to L+R, namely, a sum of the length of the bus 333 and the length of the bus 331 shown in
Cost_TX=(L+R)×(A+B+C) (formula 4.1)
[0085] Cost_TX represents the data transfer cost, L+R is the length of the bus 421 (the length of the bus 422 is equal to the length of the bus 421, and the length of the bus 423 is equal to the length of the bus 421), A represents the width of the bus 421, B represents the width of the bus 422, and C represents the width of the bus 423.
[0086] Through comparison between formula 4.1 and formula 3.1, it can be found that, in a case in which L is less than R and D is less than A+B+C, the data transfer cost generated when using the structure shown in
[0087] In some embodiments, a greater difference between R and L indicates a lower data transfer cost of the structure shown in
[0088] As described above, because the processor 401 to the processor 403 are all type 1 processors, a PS passing through the processor 401 to the processor 403 is PS_Full. Correspondingly, a size of PS_Full is PS_Full_Size, and a quantity of processor cores included in the type 1 processor of is N_Big1. In this case, a processor cost generated when using the structure shown in
Cost_Proc=PS_Full_Size×N_Big1 (formula 4.2)
[0089] Cost_Proc represents the processor cost, PS_Full_Size is the size of the PS that passes through the processor 401, and N_Big1 is the quantity of processor cores included in the processor 401.
[0090] If N_Big1=N_Little+N_Big2, compared with the structure shown in
[0091] If Latency_L is used to represent an I/O latency of a bus whose length is L, and Latency_R is used to represent an I/O latency of a bus whose length is R, a latency cost generated when using the structure shown in
Cost_LAT=(Latency_L+Latency_R)×3 (formula 4.3)
[0092] Cost_LAT represents the latency cost, Latency_L represents the I/O latency of the bus whose length is L, and Latency_R represents the I/O latency of the bus whose length is R.
[0093] It can be learned that, compared with the structure shown in
[0094] In conclusion, in the technical solutions according to embodiments of this application, corresponding functions can be implemented using lower costs (a lower data transfer cost, a lower processor cost, and a lower latency cost). In addition, because a length of a bus required inside a processing module is short and a width of a bus between a type 2 processor and the processing module is small, compared with a chip that implements a same function, an area of a chip using the technical solutions of this application is small.
[0095] The following describes two structures in
[0096] A basic process of determining a next-hop port by the ECMP is as follows: A hash value is determined based on flow identifier information (for example, a quintuple or a flow label) of a packet, and then an entry is determined based on an ECMP routing table and the hash value, where a port included in the entry is a next-hop port for sending the packet.
[0097] In some cases, to reduce entries stored in the ECMP routing table and improve lookup efficiency, the ECMP routing table may be divided into a plurality of tables, for example, may be divided into three tables, which are respectively referred to as a routing entry table 1, a routing entry table 2, and a routing entry table 3. First, based on the flow identifier information of the packet, an entry corresponding to the flow identifier information is determined from the routing entry table 1, where the entry includes one base address and an index of one routing entry table. Then, the routing entry table 2 is determined based on the index of the routing entry table, and an entry corresponding to the base address and the hash value determined based on the flow identifier information of the packet is queried from the routing entry table 2, where the entry includes one port index and an index of one routing entry table. Finally, the routing entry table 3 is determined based on the index of the routing entry table, and an entry corresponding to the port index is queried from the routing entry table 3, where the entry includes a next-hop port for the packet.
[0098]
[0099] 501: The processor 401 obtains an index (referred to as a routing table index 1 below) of one routing entry table from a received PS.
[0100] 502: The processor 401 sends the routing table index 1 to the memory 411.
[0101] 503: The processor 401 receives, from the memory 411, a routing entry table 1 corresponding to the routing table index 1.
[0102] 504: The processor 401 determines, from the routing entry table 1, an entry corresponding to flow identifier information of a packet. The entry includes an index (referred to as a routing table index 2 below) of one routing entry table and one base address, and the routing table index 2 and the base address are written into the PS.
[0103] 505: The processor 401 sends the PS (namely, the PS into which the routing table index 2 and the base address are written) to the processor 402.
[0104] 506: The processor 402 obtains the routing table index 2, the base address, and one hash value from the received PS. The hash value is determined based on the flow identifier information of the packet. The hash value may be determined by an upstream node of the processor 401 and written into the PS.
[0105] 507: The processor 402 sends the routing table index 2 to the memory 412.
[0106] 508: The processor 402 receives, from the memory 412, a routing entry table 2 corresponding to the routing table index 2.
[0107] 509: The processor 402 queries, from the routing entry table 2, an entry corresponding to the base address and the hash value, where the entry includes one port index and an index (referred to as a routing table index 3 below) of one routing entry table, and writes the port index and the routing table index 3 into the PS.
[0108] 510: The processor 402 sends the PS (namely, the PS into which the port index and the routing table index 3 are written) to the processor 403.
[0109] 511: The processor 403 obtains the routing table index 3 and the port index from the received PS.
[0110] 512: The processor 403 sends the routing table index 3 to the memory 413.
[0111] 513: The processor 403 receives, from the memory 412, a routing entry table 3 corresponding to the routing table index 3.
[0112] 514: The processor 403 queries, from the routing entry table 3, an entry corresponding to the port index, where content included in the entry is a next-hop port for the packet.
[0113] 515: The processor 403 writes the next-hop port for the packet into the PS, and sends the PS to a next node in a pipeline, so that the next node continues to process the packet.
[0114]
[0115] 601: The processor 301 obtains, from a received PS, an index (referred to as a routing table index 1 below) of one routing entry table, flow identifier information of a packet, and a hash value determined based on the flow identifier information of the packet.
[0116] 602: The processor 301 sends, to the processor 311, the routing table index 1, the flow identifier information of the packet, and the hash value determined based on the flow identifier information of the packet.
[0117] 603: The processor 311 sends the routing table index 1 to the memory 321.
[0118] 604: The processor 311 receives, from the memory 321, a routing entry table 1 corresponding to the routing table index 1.
[0119] 605: The processor 311 determines, from the routing entry table 1, an entry corresponding to the flow identifier information of the packet. The entry includes an index (referred to as a routing table index 2 below) of one routing entry table and one base address, and the routing table index 2 and the base address are written into the PS. The PS may further include the hash value determined based on the flow identifier information of the packet.
[0120] 606: The processor 311 sends the PS (namely, the PS into which the routing table index 2 and the base address are written) to the processor 312.
[0121] 607: The processor 312 obtains the routing table index 2, the base address, and the hash value from the received PS.
[0122] 608: The processor 312 sends the routing table index 2 to the memory 322.
[0123] 609: The processor 312 receives, from the memory 322, a routing entry table 2 corresponding to the routing table index 2.
[0124] 610: The processor 312 queries, from the routing entry table 2, an entry corresponding to the base address and the hash value, where the entry includes one port index and an index (referred to as a routing table index 3 below) of one routing entry table, and writes the port index and the routing table index 3 into the PS.
[0125] 611: The processor 312 sends the PS (namely, the PS into which the port index and the routing table index 3 are written) to the processor 313.
[0126] 612: The processor 313 obtains the routing table index 3 and the port index from the received PS.
[0127] 613: The processor 313 sends the routing table index 3 to the memory 323.
[0128] 614: The processor 313 receives, from the memory 323, a routing entry table 3 corresponding to the routing table index 3.
[0129] 615: The processor 313 queries, from the routing entry table 3, an entry corresponding to the port index, where content included in the entry is a next-hop port for the packet.
[0130] 616: The processor 313 sends the next-hop port for the packet to the processor 301.
[0131] 617: The processor 301 writes the next-hop port for the packet into the PS, and sends the PS to a next node in a pipeline, so that the next node continues to process the packet.
[0132] In a procedure shown in
[0133] However, in a procedure shown in
[0134] An embodiment of this application further provides a circuit. The circuit includes a first processor and a first processing module connected to the first processor. The first processing module includes a second processor connected to a first memory. A transmission latency generated when the second processor performs read and write operations on the first memory is less than a transmission latency generated when the first processor communicates with the first processing module.
[0135] For example, it is assumed that the processor module 121 shown in
[0136] For another example, it is assumed that the processing module 221 shown in
[0137] Optionally, in some embodiments, the second processor is a multi-core processor, and the transmission latency generated when the second processor performs the read and write operations on the first memory is a transmission latency generated when any core processor of the multi-core processor included in the second processor performs read and write operations on the first memory.
[0138] Optionally, in some embodiments, the first processor is connected to the first processing module through a first bus, and the second processor is connected to the first memory through a second bus, where a bus bit width of the second bus is greater than a bus bit width of the first bus, and/or a length of the second bus is less than a length of the first bus.
[0139] For example, it is still assumed that the processing module 121 shown in
[0140] For another example, it is still assumed that the processing module 221 shown in
[0141] Optionally, in some embodiments, the first processing module further includes a third processor connected to a second memory, and a transmission latency generated when the third processor performs read and write operations on the second memory is less than the transmission latency generated when the first processor communicates with the first processing module.
[0142]
[0143]
[0144] Optionally, in some embodiments, the first processor is connected to the first processing module through a first bus, the second processor is connected to the first memory through a second bus, the third processor is connected to the second memory through a third bus, and a sum of a bus bit width of the second bus and a bus width of the third bus is greater than a bus bit width of the first bus.
[0145]
[0146]
[0147] Optionally, in some embodiments, the first processing module further includes a third processor connected to the first memory, and a transmission latency generated when the third processor performs read and write operations on the first memory is less than the transmission latency generated when the first processor communicates with the first processing module.
[0148] Optionally, in some embodiments, the first processor is connected to the first processing module through a first bus, the second processor is connected to the first memory through a second bus, the third processor is connected to the first memory through a third bus, and a sum of a bus bit width of the second bus and a bus width of the third bus is greater than a bus bit width of the first bus.
[0149] Optionally, in some embodiments, the second processor and the third processor are pipeline processors.
[0150] Optionally, in some embodiments, the circuit further includes a fourth processor and a third memory connected to the fourth processor.
[0151]
[0152]
[0153] Optionally, in some embodiments, the circuit further includes a fourth processor and a second processing module connected to the fourth processor. The second processing module includes N fifth processors connected to M memories, where both N and M are integers greater than or equal to 1. A transmission latency generated when any fifth processor performs read and write operations on the memory connected to the fifth processor is less than a transmission latency generated when the fourth processor communicates with the second processing module.
[0154]
[0155]
[0156] Optionally, in some embodiments, the second processor is connected to the third processor through a fourth bus, the fourth processor is connected to the first processor through a fifth bus, and a bus bit width of the fourth bus is less than a bus bit width of the fifth bus.
[0157]
[0158]
[0159] Optionally, in some embodiments, a quantity of processor cores included in the fourth processor is greater than or equal to a quantity of processor cores included in the first processor.
[0160] Optionally, in some embodiments, the fourth processor and the first processor are pipeline processors.
[0161] Optionally, in some embodiments, the first processing module further includes the first memory.
[0162] An embodiment of this application further provides an electronic device. The electronic device includes the chip according to embodiments of this application, and the electronic device further includes a receiver and a transmitter. The receiver is configured to receive a packet and send the packet to the chip. The chip is configured to process the packet. The transmitter is configured to: obtain a packet processed by the chip, and send the processed packet to another electronic device. The electronic device may be a switch, a router, or any other electronic device on which the foregoing chip can be disposed.
[0163] The chip in embodiments of this application may be a system on chip (SoC), a network processor (NP), or the like.
[0164] The memory in embodiments of this application may be a volatile memory or a nonvolatile memory, or may include both a volatile memory and a nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), used as an external cache. By way of example and not limitation, many forms of RAMs may be used, for example, a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDR SDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchlink dynamic random access memory (SLDRAM), and a direct rambus dynamic random access memory (DR RAM). It should be noted that the memory of systems and methods described in this specification includes but is not limited to these and any memory of another proper type.
[0165] It should be noted that, the processor in embodiments of this application may be an integrated circuit chip, and has a signal processing capability. In an implementation process, steps in the foregoing method embodiments can be implemented by using a hardware integrated logical circuit in the processor, or by using instructions in a form of software. The processor may be a microprocessor, or the processor may be any conventional processor, or the like.
[0166] In an implementation process, steps in the foregoing methods can be implemented by using a hardware integrated logical circuit in the processor, or by using instructions in a form of software. The steps of the method disclosed with reference to embodiments of this application may be directly performed by a hardware processor, or may be performed by using a combination of hardware in the processor and a software module. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register. The storage medium is located in the memory, and a processor reads information in the memory and completes the steps in the foregoing methods in combination with hardware of the processor. To avoid repetition, details are not described herein again.
[0167] A person of ordinary skill in the art may be aware that, in combination with the examples described in embodiments disclosed in this specification, units and algorithm steps may be implemented by using electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by using hardware or software depends on particular applications and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
[0168] It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiments. Details are not described herein again.
[0169] In several embodiments according to this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiments are merely examples. For example, division into the units is merely logical function division and there may be other division during actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
[0170] The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual requirements to achieve the objectives of the solutions of embodiments.
[0171] In addition, functional units in embodiments of this application may be integrated into one processing unit, each of the units may exist alone physically, or two or more units may be integrated into one unit.
[0172] When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this application essentially, or the part contributing to the conventional technology, or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or some of the steps of the methods described in embodiments of this application. The foregoing storage medium includes any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disc.
[0173] The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.