Video Playing Method, Apparatus, and System, and Computer Storage Medium
20230045876 · 2023-02-16
Inventors
Cpc classification
G06F3/04847
PHYSICS
H04N21/21805
ELECTRICITY
H04N7/181
ELECTRICITY
H04N21/4728
ELECTRICITY
H04N21/6587
ELECTRICITY
H04N21/2665
ELECTRICITY
International classification
Abstract
This application discloses a video playing method, apparatus, and system, and a computer storage medium, which belongs to the field of video processing technologies. In this application, after receiving the rotation fragment, the terminal decodes the rotation fragment, so that surround playing of a video picture can be implemented, and resolution of a played video picture can be the same as resolution of the video picture in the rotation fragment. This application is not limited by a quantity of cameras used for front-end shooting, and is widely applied.
Claims
1. A video playing method, wherein the method comprises: receiving, by an upper-layer device, a surround playing request sent by a terminal, wherein the surround playing request comprises rotation shooting location information, and the rotation shooting location information is used to indicate a rotation range; determining, by the upper-layer device, playing time information based on the surround playing request; generating, by the upper-layer device, a rotation fragment based on the rotation shooting location information and the playing time information, wherein the rotation fragment comprises groups of pictures GOPs corresponding to a plurality of shooting locations in the rotation range, and the GOP comprises one or more frames of video pictures; and sending, by the upper-layer device, the rotation fragment to the terminal.
2. The method according to claim 1, wherein the playing time information comprises a playing start moment and a playing end moment, and the generating, by the upper-layer device, a rotation fragment based on the rotation shooting location information and the playing time information comprises: obtaining, by the upper-layer device, m video fragments that correspond to each of the plurality of shooting locations and that are obtained from the playing start moment to the playing end moment, wherein m is a positive integer; extracting, by the upper-layer device based on the playing time information, one or more GOPs from the m video fragments corresponding to each shooting location; and assembling, by the upper-layer device, extracted GOPs, to obtain the rotation fragment.
3. The method according to claim 2, wherein the extracting, by the upper-layer device based on the playing time information, one or more GOPs from the m video fragments corresponding to each shooting location comprises: determining, by the upper-layer device based on surround playing duration and a quantity of the plurality of shooting locations, a quantity of extracted GOPs and a GOP extraction moment that correspond to each shooting location, wherein the surround playing duration is equal to a difference between the playing end moment and the playing start moment; and extracting, by the upper-layer device based on the quantity of extracted GOPs and the GOP extraction moment that correspond to each shooting location, the GOP from the m video fragments corresponding to each shooting location.
4. The method according to claim 1, wherein the playing time information comprises a target playing moment, and the generating, by the upper-layer device, a rotation fragment based on the rotation shooting location information and the playing time information comprises: obtaining, by the upper-layer device, a target video fragment corresponding to each of the plurality of shooting locations, wherein a time period corresponding to the target video fragment comprises the target playing moment; extracting, by the upper-layer device from the target video fragment corresponding to each shooting location, one GOP corresponding to the target playing moment, wherein the GOP comprises one frame of video picture; and assembling, by the upper-layer device, extracted GOPs, to obtain the rotation fragment.
5. The method according to any one of claim 2, wherein the method further comprises: determining, by the upper-layer device, a start shooting location, an end shooting location, and a rotation direction based on the rotation shooting location information; and determining, by the upper-layer device, the plurality of shooting locations from shooting locations from the start shooting location to the end shooting location in the rotation direction; and the assembling, by the upper-layer device, extracted GOPs, to obtain the rotation fragment comprises: sequentially assembling, by the upper-layer device, the extracted GOPs in the rotation direction, to obtain the rotation fragment.
6. The method according to claim 2, wherein the determining, by the upper-layer device, playing time information based on the surround playing request comprises: determining, by the upper-layer device, the playing start moment and the playing end moment based on a moment at which the surround playing request is received and a preset policy, wherein the preset policy comprises preset surround playing duration; or the surround playing request comprises the playing start moment and the playing end moment, and the determining, by the upper-layer device, playing time information based on the surround playing request comprises: identifying, by the upper-layer device, the playing start moment and the playing end moment from the surround playing request; or the surround playing request comprises the playing start moment, and the determining, by the upper-layer device, playing time information based on the surround playing request comprises: determining, by the upper-layer device, the playing end moment based on the playing start moment and preset surround playing duration; or the surround playing request comprises the surround playing duration, and the determining, by the upper-layer device, playing time information based on the surround playing request comprises: determining, by the upper-layer device, the playing start moment and the playing end moment based on a moment at which the surround playing request is received and the surround playing duration; or the surround playing request comprises the playing start moment and the surround playing duration, and the determining, by the upper-layer device, playing time information based on the surround playing request comprises: determining, by the upper-layer device, the playing end moment based on the playing start moment and the surround playing duration.
7. The method according to claim 1, wherein the GOP is encoded in an encapsulation manner for independent transmission.
8. A video playing method, wherein the method comprises: when a terminal receives a rotation instruction, sending, by the terminal to an upper-layer device, a surround playing request generated based on the rotation instruction, wherein the surround playing request comprises rotation shooting location information, and the rotation shooting location information is used to indicate a rotation range; receiving, by the terminal, a rotation fragment sent by the upper-layer device, wherein the rotation fragment comprises groups of pictures GOPs corresponding to a plurality of shooting locations in the rotation range, and the GOP comprises one or more frames of video pictures; and decoding and playing, by the terminal, the rotation fragment.
9. The method according to claim 8, wherein the method further comprises: when the terminal detects a sliding operation in a video playing interface, determining, by the terminal, that the rotation instruction is received; determining, by the terminal, the rotation shooting location information based on sliding information of the sliding operation, wherein the sliding information comprises one or more of a sliding start location, a sliding length, a sliding direction, or a sliding angle; and generating, by the terminal, the surround playing request based on the rotation shooting location information.
10. The method according to claim 8, wherein the method further comprises: when the terminal receives a target remote control instruction sent by a remote control device, determining, by the terminal, that the rotation instruction is received, wherein the target remote control instruction comprises remote control button information, and the remote control button information comprises a button identifier and/or a quantity of times of pressing a button; determining, by the terminal, the rotation shooting location information based on the remote control button information; and generating, by the terminal, the surround playing request based on the rotation shooting location information.
11. A video playing apparatus, applied to an upper-layer device, wherein the apparatus comprises: a receiving module, configured to receive a surround playing request sent by a terminal, wherein the surround playing request comprises rotation shooting location information, and the rotation shooting location information is used to indicate a rotation range; a first determining module, configured to determine playing time information based on the surround playing request; a generation module, configured to generate a rotation fragment based on the rotation shooting location information and the playing time information, wherein the rotation fragment comprises groups of pictures GOPs corresponding to a plurality of shooting locations in the rotation range, and the GOP comprises one or more frames of video pictures; and a sending module, configured to send the rotation fragment to the terminal.
12. The apparatus according to claim 11, wherein the playing time information comprises a playing start moment and a playing end moment, and the generation module is configured to: obtain m video fragments that correspond to each of the plurality of shooting locations and that are obtained from the playing start moment to the playing end moment, wherein m is a positive integer; extract, based on the playing time information, one or more GOPs from the m video fragments corresponding to each shooting location; and assemble extracted GOPs, to obtain the rotation fragment.
13. The apparatus according to claim 12, wherein the generation module is configured to: determine, based on surround playing duration and a quantity of the plurality of shooting locations, a quantity of extracted GOPs and a GOP extraction moment that correspond to each shooting location, wherein the surround playing duration is equal to a difference between the playing end moment and the playing start moment; and extract, based on the quantity of extracted GOPs and the GOP extraction moment that correspond to each shooting location, the GOP from the m video fragments corresponding to each shooting location.
14. The apparatus according to claim 11, wherein the playing time information comprises a target playing moment, and the generation module is configured to: obtain a target video fragment corresponding to each of the plurality of shooting locations, wherein a time period corresponding to the target video fragment comprises the target playing moment; extract, from the target video fragment corresponding to each shooting location, one GOP corresponding to the target playing moment, wherein the GOP comprises one frame of video picture; and assemble extracted GOPs, to obtain the rotation fragment.
15. The apparatus according to claim 12, wherein the apparatus further comprises: a second determining module, configured to determine a start shooting location, an end shooting location, and a rotation direction based on the rotation shooting location information; and a third determining module, configured to determine the plurality of shooting locations from shooting locations from the start shooting location to the end shooting location in the rotation direction, wherein the generation module is configured to sequentially assemble the extracted GOPs in the rotation direction, to obtain the rotation fragment.
16. The apparatus according to claim 12, wherein the first determining module is configured to determine the playing start moment and the playing end moment based on a moment at which the surround playing request is received and a preset policy, wherein the preset policy comprises preset surround playing duration; or the surround playing request comprises the playing start moment and the playing end moment, and the first determining module is configured to identify the playing start moment and the playing end moment from the surround playing request; or the surround playing request comprises the playing start moment, and the first determining module is configured to determine the playing end moment based on the playing start moment and preset surround playing duration; or the surround playing request comprises the surround playing duration, and the first determining module is configured to determine the playing start moment and the playing end moment based on a moment at which the surround playing request is received and the surround playing duration; or the surround playing request comprises the playing start moment and the surround playing duration, and the first determining module is configured to determine the playing end moment based on the playing start moment and the surround playing duration.
17. The apparatus according to claim 11, wherein the GOP is encoded in an encapsulation manner for independent transmission.
18. A video playing apparatus, applied to a terminal, wherein the apparatus comprises: a sending module, configured to: when the terminal receives a rotation instruction, send, to an upper-layer device, a surround playing request generated based on the rotation instruction, wherein the surround playing request comprises rotation shooting location information, and the rotation shooting location information is used to indicate a rotation range; a receiving module, configured to receive a rotation fragment sent by the upper-layer device, wherein the rotation fragment comprises groups of pictures GOPs corresponding to a plurality of shooting locations in the rotation range, and the GOP comprises one or more frames of video pictures; and a playing module, configured to decode and play the rotation fragment.
19. The apparatus according to claim 18, wherein the apparatus further comprises: a first determining module, configured to: when the terminal detects a sliding operation in a video playing interface, determine that the rotation instruction is received; a second determining module, configured to determine the rotation shooting location information based on sliding information of the sliding operation, wherein the sliding information comprises one or more of a sliding start location, a sliding length, a sliding direction, or a sliding angle; and a generation module, configured to generate the surround playing request based on the rotation shooting location information.
20. The apparatus according to claim 18, wherein the apparatus further comprises: a third determining module, configured to: when the terminal receives a target remote control instruction sent by a remote control device, determine that the rotation instruction is received, wherein the target remote control instruction comprises remote control button information, and the remote control button information comprises a button identifier and/or a quantity of times of pressing a button; a fourth determining module, configured to determine the rotation shooting location information based on the remote control button information; and a generation module, configured to generate the surround playing request based on the rotation shooting location information.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DETAILED DESCRIPTION
[0045] To make the objectives, technical solutions, and advantages of this application clearer, the following further describes the implementations of this application in detail with reference to the accompanying drawings.
[0046]
[0047] The media source 101 is configured to provide a plurality of video streams. Refer to
[0048] The video server 102 is configured to: process, by using an OTT (over the top) technology, the video stream sent by the media source 101, and distribute the processed video stream to the terminal through a content delivery network (content delivery network, CDN). The CDN is an intelligent virtual network built based on an existing network, and relies on edge servers deployed at various locations. Optionally, referring to
[0049] The terminal 103, namely, a video playing end, is configured to decode and play the video stream sent by the video server 102. Optionally, the terminal 103 can change a playing angle in one or more of control manners such as a touch control manner, a voice control manner, a gesture control manner, or a remote control manner. A control manner for triggering the terminal to change the playing angle is not limited in this embodiment of this application. For example, the terminal 103 may be a device that can change the playing angle in the touch control manner or the voice control manner, for example, a mobile phone, a tablet computer, or an intelligent wearable device. Alternatively, the terminal 103 may be a device that can change the playing angle through control performed by using a remote control, for example, a set top box (set top box, STB).
[0050] In this embodiment of this application, a video stream is transmitted between the video server 102 and the terminal 103 based on a hypertext transfer protocol (hypertext transfer protocol, HTTP). Optionally, after obtaining the plurality of video streams, the front-end encoder 1012 on a side of the media source 101 or the video processing server 1021 on a side of the video server 102 re-encodes (which may also be referred to as transcoding) each video stream, to obtain a GOP, and generates a video fragment based on the GOP, to transmit the GOP. A plurality of GOPs are usually encapsulated in the video fragment, and each GOP includes one or more frames of video pictures. The GOP includes a group of video pictures that are consecutive in time. A time stamp of a GOP obtained by re-encoding a video stream corresponds to a moment at which a camera collects a video picture in the GOP. For example, the time stamp of the GOP may be set to a moment at which a last frame of video picture in the GOP is collected. For another example, when the GOP includes a plurality of frames of video pictures, the GOP corresponds to a start time stamp and an end time stamp, the start time stamp is a moment at which a first frame of video picture in the GOP is collected, and the end time stamp is a moment at which a last frame of video picture in the GOP is collected.
[0051] Optionally, a time length of the GOP is less than or equal to 100 milliseconds. A time parameter of the GOP may be set by an administrator. In a case of a fixed time length, a quantity of frames of video pictures included in each GOP is positively related to a shooting frame rate of a camera. In other words, a higher shooting frame rate of the camera indicates a larger quantity of frames of the video pictures included in each GOP. For example, the GOP may include two frames of video pictures (a corresponding quantity of frames per second (frame per second, FPS) is 25 (25 FPS for short)), three frames of video pictures (corresponding to 30 FPS), five frames of video pictures (corresponding to 50 FPS), or six frames of video pictures (corresponding to 60 FPS). Certainly, the GOP may alternatively include only one frame of video picture or include more frames of video pictures. This is not limited in this embodiment of this application.
[0052] In this embodiment of this application, the GOP in the video fragment is encoded in an encapsulation manner for independent transmission, so that each GOP can be used as a separate chunk (chunk) for independent transmission and use. For example, the video fragment may be encapsulated in a fragmented mp4 (fragmented mp4, fmp4) format. The fmp4 format is a streaming media format defined in an MPEG-4 standard provided by a moving picture experts group (moving picture expert group, MPEG).
[0053] The video processing server 1021 on the side of the video server 102 generates a media content index (which may also be referred to as an OTT index) based on externally specified data. The media content index is used to describe information about each video stream, and the media content index is essentially a file that describes the information about the video stream. The information about the video stream includes address information of the video stream, time information of the video stream, or the like. The address information of the video stream is used to indicate an obtaining address of the video stream. For example, the address information of the video stream may be a uniform resource locator (uniform resource locator, URL) address corresponding to the video stream. The time information of the video stream is used to indicate a start moment and an end moment of each video fragment in the video stream. Optionally, the media content index may further include shooting location information. The shooting location information includes a quantity of shooting locations (namely, a quantity of cameras on the side of the media source) and a shooting location angle corresponding to each video stream. The shooting location angle corresponding to the video stream is a shooting location angle corresponding to a camera.
[0054] For example,
[0055] Optionally, the media content index in this embodiment of this application may be an m3u8 file (which may be referred to as an HLS index) or a media presentation description (media presentation description, MPD) file (which may be referred to as a DASH index). The m3u8 file is an m3u file in a UTF-8 encoding format.
[0056] A process in which the terminal obtains video content in the video server includes: The terminal first downloads the media content index from the video server, and parses the media content index, to obtain information about the video stream. The terminal selects a video stream that needs to be played currently, extracts a URL address of the video stream from the media content index, and then sends a media content request to the video server based on the URL address of the video stream. After receiving the media content request, the video server sends the corresponding video stream to the terminal.
[0057] Optionally, referring to
[0058]
[0059] Step 401: When a terminal receives a rotation instruction, the terminal generates a surround playing request.
[0060] The surround playing request includes rotation shooting location information, and the rotation shooting location information is used to indicate a rotation range. Optionally, when a media content index obtained by the terminal includes shooting location information, after the terminal receives the rotation instruction, the terminal may determine a start shooting location, an end shooting location, and a rotation direction based on the rotation instruction and the location information. In this case, the rotation shooting location information may include a start shooting location identifier, an end shooting location identifier, and the rotation direction. Alternatively, after receiving the rotation instruction, the terminal may determine a rotation angle based on the rotation instruction. In this case, the rotation shooting location information may include the rotation angle.
[0061] Optionally, when the terminal receives the rotation instruction in a video playing state, the surround playing request generated by the terminal is used to request to perform dynamic surround playing of video content. In this case, the surround playing request is further used to determine a playing start moment and a playing end moment. Optionally, the surround playing request further includes playing time information, and the playing time information includes one or more of the playing start moment, the playing end moment, or surround playing duration.
[0062] Optionally, when the terminal receives the rotation instruction in a state in which playing of a video is paused, the surround playing request generated by the terminal is used to request to perform static surround playing of video content. In this case, the surround playing request is further used to determine a target playing moment. Optionally, the surround playing request includes the target playing moment, and the target playing moment may be a video pausing moment. Static surround playing of the video content is to perform surround playing of video pictures that correspond to the target playing moment and that are provided by a plurality of shooting locations.
[0063] In a possible implementation, when the terminal detects a sliding operation in a video playing interface, the terminal determines that the rotation instruction is received. The terminal determines the rotation shooting location information based on sliding information of the sliding operation. The sliding information includes one or more of a sliding start location, a sliding length, a sliding direction, or a sliding angle. Then, the terminal generates the surround playing request based on the rotation shooting location information. The sliding start location, the sliding length, and the sliding direction may be used to determine the start shooting location, the end shooting location, and the rotation direction. The sliding angle may be used to determine the rotation angle.
[0064] Optionally, the sliding start location corresponds to the start shooting location, the sliding direction corresponds to the rotation direction, and the sliding length is used to define a quantity of to-be-switched shooting locations. That the sliding direction is the left indicates counterclockwise rotation, and that the sliding direction is the right indicates clockwise rotation. Each time the sliding length reaches a unit length, it indicates to switch one shooting location. For example, the unit length may be set to one centimeter. When the sliding length reaches three centimeters, it indicates to switch three shooting locations. Sliding sensitivity is negatively related to a specified value of the unit length. In other words, a smaller specified value of the unit length indicates higher sliding sensitivity. The sliding sensitivity may be set based on an actual requirement.
[0065] For example, it is assumed that the sliding direction is the right, the sliding length is 5 centimeters, and the unit length is one centimeter. In this case, it indicates to switch five shooting locations through clockwise rotation. Refer to
[0066] Optionally, when the surround playing request is used to request to perform dynamic surround playing of the video content, the surround playing duration may be further defined based on sliding duration. For example, the surround playing duration may be equal to the sliding duration.
[0067] Optionally, the sliding angle is used to determine the rotation angle. It may be set that the rotation angle and the sliding angle meet a specific relationship. For example, the rotation angle is equal to the sliding angle, or the rotation angle equal to two times of the sliding angle. When the rotation shooting location information includes the rotation angle, a positive value and a negative value of the rotation angle may be further used to indicate the rotation direction. For example, if the rotation angle has a positive value, it indicates clockwise rotation, or if the rotation angle has a negative value, it indicates counterclockwise rotation.
[0068] In another possible implementation, when the terminal receives a target remote control instruction sent by a remote control device, the terminal determines that the rotation instruction is received. The target remote control instruction includes remote control button information, and the remote control button information includes a button identifier and/or a quantity of times of pressing a button. The terminal determines the rotation shooting location information based on the remote control button information. Then, the terminal generates the surround playing request based on the rotation shooting location information. The button identifier may be used to determine the rotation direction. The quantity of times of pressing a button may be used to determine a quantity of to-be-switched shooting locations.
[0069] Optionally, the rotation direction is determined based on the button identifier. For example, when the remote control button information includes an identifier of a left button, it indicates that the rotation direction is a counterclockwise direction, and when the remote control button information includes an identifier of a right button, it indicates that the rotation direction is a clockwise direction. Certainly, another button may be disposed on the remote control device, to control the rotation direction. This is not limited in this embodiment of this application. The quantity of times of pressing a button is used to define the quantity of to-be-switched shooting locations. For example, if the quantity of times of pressing a button is 1, it indicates to switch one shooting location.
[0070] For example, it is assumed that the remote control button information includes the identifier of the left button, and the quantity of times of pressing a button is 3. In this case, it indicates to switch three shooting locations through counterclockwise rotation. Refer to
[0071] Optionally, when the surround playing request is used to request to perform dynamic surround playing of the video content, the surround playing duration may be further defined based on button pressing duration. For example, the surround playing duration may be equal to the button pressing duration.
[0072] Step 402: The terminal sends the surround playing request to an upper-layer device.
[0073] The upper-layer device is an upstream device of the terminal. Optionally, the upper-layer device may be a video server (which may be specifically a video distribution server) or a network device in the video playing system shown in
[0074] Step 403: The upper-layer device determines the playing time information based on the surround playing request.
[0075] In an optional embodiment of this application, the surround playing request is used to request to perform dynamic surround playing of the video content, and the playing time information includes the playing start moment and the playing end moment. There are five implementations in which the upper-layer device determines the playing time information based on the surround playing request:
[0076] In a first implementation, an implementation process of step 403 includes: The upper-layer device determines the playing start moment and the playing end moment based on a moment at which the surround playing request is received and a preset policy. The preset policy includes preset surround playing duration.
[0077] Optionally, it is defined in the preset policy that a video playing moment at which the upper-layer device receives the surround playing request is used as the playing start moment, and duration of an interval between the playing end moment and the playing start moment is equal to the preset surround playing duration. For example, when the video playing moment at which the upper-layer device receives the surround playing request is 00:19:35, and the preset surround playing duration is two seconds. In this case, the upper-layer device determines that the playing start moment is 00:19:35, and the playing end moment is 00:19:37. Alternatively, it is alternatively defined in the preset policy that a video playing moment that is separated by specific duration from a receiving moment (corresponding to a video playing moment) of the surround playing request is used as the playing start moment, and the playing start moment may be before the receiving moment of the surround playing request in a time sequence, or the playing start moment may be after the receiving moment of the surround playing request in a time sequence. For example, the receiving moment of the surround playing request is 00:19:35, and the playing start moment may be 00:19:34, or the playing start moment may be 00:19:36.
[0078] In a second implementation, the surround playing request includes the playing start moment and the playing end moment. An implementation process of step 403 includes: The upper-layer device identifies the playing start moment and the playing end moment from the surround playing request.
[0079] Optionally, a specified field of the surround playing request is predefined or preconfigured to carry the playing start moment and the playing end moment. That the specified field of the surround playing request is predefined may be that the specified field of the surround playing request is defined in a standard or a protocol. That the specified field of the surround playing request is preconfigured may be that the upper-layer device and the terminal pre-negotiate the specified field of the surround playing request. After receiving the surround playing request, the upper-layer device may identify the playing start moment and the playing end moment from the specified field.
[0080] For example, the specified field of the surround playing request carries two moments, and the two moments are respectively 00:19:35 and 00:19:37. In this case, the upper-layer device determines that the playing start moment is 00:19:35, and the playing end moment is 00:19:37.
[0081] In a third implementation, the surround playing request includes the playing start moment. An implementation process of step 403 includes: The upper-layer device determines the playing end moment based on the playing start moment and preset surround playing duration.
[0082] For example, the playing start moment carried in the surround playing request is 00:19:35, and the preset surround playing duration is two seconds. In this case, the upper-layer device determines that the playing end moment is 00:19:37.
[0083] In a fourth implementation, the surround playing request includes the surround playing duration. An implementation process of step 403 includes: The upper-layer device determines the playing start moment and the playing end moment based on a moment at which the surround playing request is received and the surround playing duration.
[0084] Optionally, for a manner in which the upper-layer device determines the playing start moment and the playing end moment, refer to the first implementation. Details are not described herein again in this embodiment of this application.
[0085] In a fifth implementation, the surround playing request includes the playing start moment and the surround playing duration. An implementation process of step 403 includes: The upper-layer device determines the playing end moment based on the playing start moment and the surround playing duration.
[0086] For example, the playing start moment carried in the surround playing request is 00:19:35, and the surround playing duration is two seconds. In this case, the upper-layer device determines that the playing end moment is 00:19:37.
[0087] In another optional embodiment of this application, the surround playing request is used to request to perform static surround playing of the video content, and the playing time information includes the target playing moment. Optionally, the surround playing request includes the target playing moment. Alternatively, the surround playing request does not include the target playing moment, and the upper-layer device determines the target playing moment based on a moment at which the surround playing request is received. For a manner in which the upper-layer device determines the target playing moment, refer to a manner in which the upper-layer device determines the playing start moment in the first implementation. Details are not described herein again in this embodiment of this application.
[0088] Step 404: The upper-layer device determines the start shooting location, the end shooting location, and the rotation direction based on the rotation shooting location information.
[0089] Optionally, when the rotation shooting location information includes the start shooting location identifier, the end shooting location identifier, and the rotation direction, after the upper-layer device receives the surround playing request, the upper-layer device may determine the start shooting location, the end shooting location, and the rotation direction based on content in the rotation shooting location information.
[0090] Optionally, when the rotation shooting location information includes the rotation angle, after the upper-layer device receives the surround playing request, the upper-layer device determines the end shooting location and the rotation direction based on the start shooting location and the rotation angle. For example, referring to
[0091] Step 405: The upper-layer device determines the plurality of shooting locations from shooting locations from the start shooting location to the end shooting location in the rotation direction.
[0092] Optionally, the plurality of shooting locations determined by the upper-layer device may include all shooting locations from the start shooting location to the end shooting location in the rotation direction. For example, referring to
[0093] Step 406: The upper-layer device generates a rotation fragment based on the rotation shooting location information and the playing time information.
[0094] The rotation fragment includes GOPs corresponding to the plurality of shooting locations in the rotation range. Optionally, the rotation fragment sequentially includes the GOPs corresponding to the plurality of shooting locations from the start shooting location to the end shooting location in the rotation direction.
[0095] In an optional embodiment of this application, the surround playing request is used to request to perform dynamic surround playing of the video content, and each GOP in the rotation fragment includes one or more frames of video pictures. An implementation process of step 406 includes the following steps.
[0096] Step 4061A: The upper-layer device obtains m video fragments that correspond to each of the plurality of shooting locations and that are obtained from the playing start moment to the playing end moment, where m is a positive integer.
[0097] For example, it is assumed that the plurality of shooting locations sequentially include q shooting locations in the rotation direction, the playing start moment is T1, the playing end moment is T2, q is an integer greater than 0, T2>T1, and each video stream corresponding to each shooting location includes m video fragments in a time period (T1, T2). In this case, the upper-layer device separately obtains m video fragments corresponding to the q shooting locations in the time period (T1, T2).
[0098] Step 4062A: The upper-layer device extracts, based on the playing time information, one or more GOPs from the m video fragments corresponding to each shooting location.
[0099] Optionally, the upper-layer device determines, based on the surround playing duration and a quantity of the plurality of shooting locations, a quantity of extracted GOPs and a GOP extraction moment that correspond to each shooting location. The surround playing duration is equal to a difference between the playing end moment and the playing start moment. The upper-layer device extracts, based on the quantity of extracted GOPs and the GOP extraction moment that correspond to each shooting location, the GOP from the m video fragments corresponding to each shooting location.
[0100] Optionally, in two shooting locations arranged in the rotation direction, a GOP extraction moment corresponding to a previous shooting location is located, in a time sequence, before a GOP extraction moment corresponding to a current shooting location. The quantity of extracted GOPs corresponding to each shooting location is equal to a ratio of the surround playing duration to a product of a time length of the GOP and the quantity of the plurality of shooting locations (the ratio may be rounded up or rounded down).
[0101] For example, referring to the example in step 4061A, it is assumed that a time length of each GOP is t, and the quantity of extracted GOPs corresponding to each shooting location is equal to (T2−T1)/(q×t).
[0102] Step 4063A: The upper-layer device assembles extracted GOPs, to obtain the rotation fragment.
[0103] Optionally, the upper-layer device sequentially assembles the extracted GOPs in the rotation direction, to obtain the rotation fragment. The rotation fragment is a dynamic rotation fragment.
[0104] For example, referring to the example in step 4061A, it is assumed that q=5, m=1, each video fragment includes five GOPs, and the quantity of extracted GOPs corresponding to each shooting location is 1.
[0105] In another optional embodiment of this application, the surround playing request is used to request to perform static surround playing of the video content, and each GOP in the rotation fragment includes one frame of video picture. An implementation process of step 406 includes the following steps.
[0106] Step 4061B: The upper-layer device obtains a target video fragment corresponding to each of the plurality of shooting locations, where a time period corresponding to the target video fragment includes the target playing moment.
[0107] That the time period corresponding to the target video fragment includes the target playing moment means that the target playing moment is located between a start moment and an end moment of the target video fragment.
[0108] Step 4062B: The upper-layer device extracts, from the target video fragment corresponding to each shooting location, one GOP corresponding to the target playing moment.
[0109] One GOP corresponding to the target playing moment means that a moment at which a video picture in the GOP is collected is the target playing moment.
[0110] Step 4063B: The upper-layer device assembles extracted GOPs, to obtain the rotation fragment.
[0111] Optionally, the upper-layer device sequentially assembles the extracted GOPs in the rotation direction, to obtain the rotation fragment. The rotation fragment is a static rotation fragment.
[0112] For example, it is assumed that the quantity of the plurality of shooting locations is 5, and each video fragment includes five GOPs.
[0113] Optionally, a quantity of GOPs included in the rotation fragment may be the same as or different from a quantity of GOPs included in another video fragment. For example, the quantity of GOPs included in the rotation fragment may be less than the quantity of GOPs included in the another video fragment. This is not limited in this embodiment of this application.
[0114] Optionally, when the upper-layer device is a network device, after the upper-layer device receives the surround playing request, the upper-layer device first downloads the media content index from the video server, and parses the media content index, to obtain information about a video stream. The upper-layer device extracts, from the media content index, a URL address of a video stream corresponding to each of the plurality of shooting locations, and then separately obtains a corresponding video fragment based on the URL address of the video stream.
[0115] Step 407: The upper-layer device sends the rotation fragment to the terminal.
[0116] Optionally, when the surround playing request is used to request to perform dynamic surround playing of the video content, after the upper-layer device sends the rotation fragment to the terminal, the upper-layer device continues to send, to the terminal, a video stream corresponding to the end shooting location, so that the terminal can be smoothly switched from a played picture corresponding to the start shooting location to a played picture corresponding to the end shooting location. When the surround playing request is used to request to perform static surround playing of the video content, after the upper-layer device sends the rotation fragment to the terminal, the upper-layer device stops sending video data to the terminal.
[0117] Step 408: The terminal decodes and plays the rotation fragment.
[0118] The terminal decodes and plays the rotation fragment, to implement surrounding playing of video pictures corresponding to the plurality of shooting locations from the start shooting location to the end shooting location in the rotation direction. Resolution of a video picture played by the terminal may be the same as resolution of the video picture in the rotation fragment.
[0119] A sequence of the steps of the method embodiment provided in this embodiment of this application can be appropriately adjusted, and the steps can also be correspondingly added or deleted based on a situation. Any method that can be readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, details are not described again.
[0120] In conclusion, in the video playing method provided in this embodiment of this application, the upper-layer device determines the playing time information based on the surround playing request sent by the terminal, and then generates the rotation fragment based on the playing time information and the rotation shooting location information in the surround playing request. Because the rotation fragment includes the GOPs corresponding to the plurality of shooting locations in the rotation range indicated by the rotation shooting location information, after the terminal receives the rotation fragment, the terminal decodes the rotation fragment, so that surround playing of the video picture can be implemented, and the resolution of the played video picture can be the same as the resolution of the video picture in the rotation fragment. Therefore, the video playing method provided in this embodiment of this application is not limited by a quantity of cameras used for front-end shooting, and is widely applied. In addition, the upper-layer device may be a video distribution server or a network device, to reduce a requirement for processing performance of a video processing server. There is high implementation reliability.
[0121]
[0122] Optionally, the playing time information includes a playing start moment and a playing end moment, and the generation module 703 is configured to: obtain m video fragments that correspond to each of the plurality of shooting locations and that are obtained from the playing start moment to the playing end moment, where m is a positive integer; extract, based on the playing time information, one or more GOPs from the m video fragments corresponding to each shooting location; and assemble extracted GOPs, to obtain the rotation fragment.
[0123] Optionally, the generation module 703 is specifically configured to: determine, based on surround playing duration and a quantity of the plurality of shooting locations, a quantity of extracted GOPs and a GOP extraction moment that correspond to each shooting location, where the surround playing duration is equal to a difference between the playing end moment and the playing start moment; and extract, based on the quantity of extracted GOPs and the GOP extraction moment that correspond to each shooting location, the GOP from the m video fragments corresponding to each shooting location.
[0124] Optionally, the playing time information includes a target playing moment, and the generation module 703 is configured to: obtain a target video fragment corresponding to each of the plurality of shooting locations, where a time period corresponding to the target video fragment includes the target playing moment; extract, from the target video fragment corresponding to each shooting location, one GOP corresponding to the target playing moment, where the GOP includes one frame of video picture; and assemble extracted GOPs, to obtain the rotation fragment.
[0125] Optionally, as shown in
[0126] The generation module 703 is configured to sequentially assemble the extracted GOPs in the rotation direction, to obtain the rotation fragment.
[0127] Optionally, the first determining module 702 is configured to determine the playing start moment and the playing end moment based on a moment at which the surround playing request is received and a preset policy, where the preset policy includes preset surround playing duration; or the surround playing request includes the playing start moment and the playing end moment, and the first determining module 702 is configured to identify the playing start moment and the playing end moment from the surround playing request; or the surround playing request includes the playing start moment, and the first determining module 702 is configured to determine the playing end moment based on the playing start moment and preset surround playing duration; or the surround playing request includes the surround playing duration, and the first determining module 702 is configured to determine the playing start moment and the playing end moment based on a moment at which the surround playing request is received and the surround playing duration; or the surround playing request includes the playing start moment and the surround playing duration, and the first determining module 702 is configured to determine the playing end moment based on the playing start moment and the surround playing duration.
[0128] Optionally, the GOP is encoded in an encapsulation manner for independent transmission.
[0129] In conclusion, in the video playing apparatus provided in this embodiment of this application, the upper-layer device determines, by using the first determining module, the playing time information based on the surround playing request sent by the terminal, and then generates, by using the generation module, the rotation fragment based on the playing time information and the rotation shooting location information in the surround playing request. Because the rotation fragment includes the GOPs corresponding to the plurality of shooting locations in the rotation range indicated by the rotation shooting location information, after the terminal receives the rotation fragment, the terminal decodes the rotation fragment, so that surround playing of the video picture can be implemented, and resolution of a played video picture can be the same as resolution of the video picture in the rotation fragment. Therefore, the video playing apparatus provided in this embodiment of this application is not limited by a quantity of cameras used for front-end shooting, and is widely applied. In addition, the upper-layer device may be a video distribution server or a network device, to reduce a requirement for processing performance of a video processing server. There is high implementation reliability.
[0130]
[0131] Optionally, as shown in
[0132] Optionally, as shown in
[0133] The generation module 906 is configured to generate the surround playing request based on the rotation shooting location information.
[0134] In conclusion, in the video playing apparatus provided in this embodiment of this application, after receiving the rotation instruction, the terminal sends the surround playing request to the upper-layer device by using the sending module, and then receives, by using the receiving module, the rotation fragment sent by the upper-layer device. Because the rotation fragment includes the GOPs corresponding to the plurality of shooting locations in the rotation range indicated by the rotation shooting location information, after the terminal receives the rotation fragment, the terminal decodes the rotation fragment by using the playing module, so that surround playing of the video picture can be implemented, and resolution of a played video picture can be the same as resolution of the video picture in the rotation fragment. Therefore, a video playing method provided in this embodiment of this application is not limited by a quantity of cameras used for front-end shooting, and is widely applied. In addition, the upper-layer device may be a video distribution server or a network device, to reduce a requirement for processing performance of a video processing server. There is high implementation reliability.
[0135] For the apparatus in the foregoing embodiment, a specific manner of executing an operation by each module is described in detail in a related method embodiment, and details are not described herein.
[0136] An embodiment of this application further provides a video playing system. The system includes an upper-layer device and a terminal. The upper-layer device includes the video playing apparatus shown in
[0137]
[0138] The memory 1202 is configured to store a computer program, where the computer program includes program instructions.
[0139] The processor 1201 is configured to invoke the computer program, to implement an action performed by an upper-layer device or an action performed by a terminal in the video playing method shown in
[0140] Optionally, the video playing apparatus 120 further includes a communication bus 1203 and a communication interface 1204.
[0141] The processor 1201 includes one or more processing cores, and the processor 1201 executes various functional applications and performs data processing by running the computer program.
[0142] The memory 1202 may be configured to store the computer program. Optionally, the memory may store an operating system and an application program unit that is required by at least one function. The operating system may be an operating system such as a real time operating system (Real Time eXecutive, RTX), LINUX, UNIX, WINDOWS, or OS X.
[0143] There may be a plurality of communication interfaces 1204. The communication interfaces 1204 are configured to communicate with another storage device or a network device. For example, in this embodiment of this application, a communication interface of the upper-layer device may be configured to send a rotation fragment to the terminal, and a communication interface of the terminal may be configured to send a surround playing request to the upper-layer device. The network device may be a switch, a router, or the like.
[0144] The memory 1202 and the communication interface 1204 are separately connected to the processor 1201 through the communication bus 1203.
[0145] An embodiment of this application further provides a computer storage medium. The computer storage medium stores instructions. When the instructions are executed by a processor of a computer device, an action performed by an upper-layer device or an action performed by a terminal in the video playing method in the foregoing method embodiment is implemented.
[0146] A person of ordinary skill in the art may understand that all or some of the steps of embodiments may be implemented by hardware or a program instructing related hardware. The program may be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic disk, a compact disc, or the like.
[0147] In embodiments of this application, the terms “first”, “second”, and “third” are merely used for a purpose of description, and shall not be understood as an indication or implication of relative importance.
[0148] The term “and/or” in this application describes only an association relationship for describing associated objects and represents that three relationships may exist. For example, A and/or B may represent the following three cases: Only A exists, both A and B exist, and only B exists. In addition, the character “/” in this specification generally indicates an “or” relationship between the associated objects.
[0149] The foregoing descriptions are merely optional embodiments of this application, but are not intended to limit this application. Any modification, equivalent replacement, improvement, or the like made without departing from the concept and principle of this application should fall within the protection scope of this application.