Server device, method of transmission processing of server device, client device, method of reception processing of client device, and server system

10965971 ยท 2021-03-30

Assignee

Inventors

Cpc classification

International classification

Abstract

An imaging unit captures an image of a subject to obtain image data. A transmission unit transmits, to a client device, a container in a predetermined format including a video stream configured by the image data obtained by imaging. Capture information indicating an imaging state in the imaging unit is inserted in a layer of the video stream and/or a layer of the container. The capture information is, for example, information regarding an imaging direction of the imaging unit, information regarding a focus distance of the imaging unit, and information regarding a zoom ratio of the imaging unit. The client device can control an imaging state of a server device.

Claims

1. A server device, comprising: a camera configured to capture an image of a subject to obtain image data; and circuitry configured to transmit a container in a predetermined format including a video stream of the image data to a client device, capture information indicating an imaging state in the camera being inserted in a layer of the video stream and/or a layer of the container, receive control information that is obtained in response to a user operation via a user interface, the user interface being displayed according to the imaging state of the camera that is indicated by the capture information inserted in the layer of the video stream and/or the layer of the container, and change a setting of the imaging state of the camera based on the received control information.

2. The server device according to claim 1, wherein the capture information includes information regarding an imaging direction of the camera.

3. The server device according to claim 1, wherein the capture information includes information regarding a focus distance of the camera.

4. The server device according to claim 1, wherein the capture information includes information regarding a zoom ratio of the camera.

5. The server device according to claim 1, wherein position information indicating a position of the server device is further inserted in the layer of the video stream and/or the layer of the container.

6. The server device according to claim 5, wherein the position information is GPS data.

7. The server device according to claim 5, wherein identification information for identifying a combination of the capture information and the position information and identification information for identifying the server device are further inserted in the layer of the video stream and/or the layer of the container.

8. The server device according to claim 1, wherein the capture information indicating the imaging state is inserted in a supplemental enhancement information message of the video stream and/or a user data box of the container.

9. A client device, comprising: reception circuitry configured to receive a container in a predetermined format including a video stream from a server device, the video stream including image data obtained by imaging a subject with a camera, and capture information indicating an imaging state of the camera being inserted in a layer of the video stream and/or a layer of the container; and processing circuitry configured to obtain display image data from the video stream, display a user interface according to the imaging state of the camera that is indicated by the capture information inserted in the layer of the video stream and/or the layer of the container, obtain control information to change a setting of the imaging state of the camera in response to a user operation via the user interface, and transmit the control information to the server device.

10. The client device according to claim 9, wherein position information indicating a position of the server device is further inserted in the layer of the video stream and/or the layer of the container, and the processing circuitry is configured to obtain access information for accessing another server device different from the server device based on the user operation and processing for accessing the other server device based on the access information.

11. The client device according to claim 10, wherein the access information is an URL of the other server device.

12. The client device according to claim 10, wherein the access information is the position information and an URL of a conversion server that converts the position information into an URL.

13. A client device, comprising: circuitry configured to transmit, to a server device, a reproduction request in which a first video stream to be reproduced is specified by capture device identification information for identifying an imaging device and capture view identification information selected by a user for identifying a combination of a position information of the imaging device and imaging state information of the imaging device, the capture view identification information being inserted in a layer of a second video stream and/or a layer of a container of the second video stream, and the imaging state information indicating a plurality imaging state settings for the imaging device; receive a container in a predetermined format including a video stream corresponding to the reproduction request from the server device; and obtain display image data from the video stream included in the container.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a diagram illustrating an exemplary configuration of a server system according to a first embodiment.

(2) FIG. 2 is a diagram illustrating an exemplary stream of MP4.

(3) FIG. 3 is a diagram illustrating exemplary information stored in a lcib box.

(4) FIG. 4 is an explanatory diagram of direction information, focus distance information, horizontal angle information, and vertical angle information as capture information.

(5) FIG. 5 is a diagram illustrating an exemplary structure of a Location Capture information SEI message.

(6) FIG. 6 is a diagram illustrating an exemplary structure of control information (control meta information).

(7) FIG. 7 is a diagram illustrating contents of main information in the exemplary structure of the control information (control meta information).

(8) FIG. 8 is an explanatory diagram of operation of tracking (automatic adaptation).

(9) FIG. 9 is an explanatory diagram of control of an imaging state.

(10) FIG. 10 is a diagram illustrating an exemplary description of an MPD file including URL information of a media capture/location server to be accessed first.

(11) FIG. 11 is a diagram illustrating an exemplary user Interface screen (UI screen) that shows a current imaging state and has a changing operation unit of the imaging state.

(12) FIG. 12 is a diagram illustrating an example of specifying, using information associated with a direction and a distance, another media capture/location server with respect to a certain media capture/location server.

(13) FIG. 13 is a block diagram illustrating an exemplary configuration of the media capture/location server.

(14) FIG. 14 is a block diagram illustrating an exemplary configuration of a client device.

(15) FIG. 15 is a diagram illustrating an exemplary structure of a server access message as a first message including URL information.

(16) FIG. 16 is a diagram illustrating an exemplary structure of the server access message as a second message including URL information of a GPS URL mapping server and GPS data.

(17) FIG. 17 is a sequence diagram illustrating exemplary communication between the client device and each server.

(18) FIG. 18 is a diagram illustrating an exemplary configuration of a server system according to a second embodiment.

(19) FIG. 19 is a diagram illustrating an exemplary description of an MPD file including information regarding a plurality of video streams stored in a storage server.

(20) FIG. 20 is a block diagram illustrating an exemplary configuration of the storage server.

(21) FIG. 21 is another sequence diagram illustrating exemplary communication between the client device and each server.

MODE FOR CARRYING OUT THE INVENTION

(22) Hereinafter, embodiments for carrying out the present invention (hereinafter referred to as embodiments) will be described. Note that descriptions will be made in the following order.

(23) 1. First Embodiment

(24) 2. Second Embodiment

(25) 3. Variation

1. First Embodiment

(26) [Exemplary Configuration of Server System]

(27) FIG. 1 illustrates an exemplary configuration of a server system 10 according to a first embodiment. The server system 10 includes a plurality of server devices and a client device 200, which are mutually connected via a network.

(28) The plurality of server devices include a plurality of media capture/location servers 101, an initial access server 102, and a GPS URL mapping server 103. In a case where there is an access from the client device 200, the media capture/location server 101 transmits, to the client device 200 via the network, a container in a predetermined format including a video stream configured by image data obtained by imaging using a camera (imaging unit), that is, an MP4 distribution stream in the present embodiment.

(29) In this case, information such as capture information indicating an imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit), position information indicating a position of its own server device, that is, global positioning system (GPS) data in the present embodiment, identification information for identifying a combination of the capture information and the position information (capture view ID) and identification information for identifying its own server device (capture device ID), and uniform resource locator (URL) of its own server device is inserted in a layer of the video stream and/or a layer of the container.

(30) For inserting information into the layer of the container, for example, a user data box udta box of MP4 is used. FIG. 2 illustrates an exemplary stream of MP4. The entire service stream is fragmented and transmitted so that an image sound comes out from in the middle of transmission in general broadcasting or the like. In this case, as illustrated in the drawing, it is started from an initialization segment (IS), and boxes of styp, segment index box (sidx), sub-segment index box (ssix), movie fragment box (moof), and media data box (mdat) follows thereafter.

(31) The udta box is defined in the initialization segment (IS) or the moof box as illustrated in the drawing. In other words, there is a moov box in the initialization segment (IS) and there is a trak box in the moov box, the udta box is defined in the trak box.

(32) In this case, a location capture information box lcib box is newly defined in the udta box, which is to be newly defined. FIG. 3 illustrates exemplary information stored in the lcib box. GPS data includes a latitude, longitude, elevation, and time. The latitude is latitude data, the longitude is longitude data, the elevation is altitude data, and the time is time data.

(33) The capture information includes a direction, H_angle, V_angle, focus_distance, and zoom_ratio. As illustrated in FIG. 4(a), the direction is direction information such as north, south, east, and west of the camera (imaging unit). As illustrated in FIG. 4(b), the H_angle is horizontal angle information in a case where the above-described direction of the camera (imaging unit) is set as a reference (zero degrees). As illustrated in FIG. 4(c), the V_angle is vertical angle information in a case where the horizontal position of the camera (imaging unit) is set as a reference (zero degrees). As illustrated in FIG. 4(d), the focus_distance is focus distance information of the camera (imaging unit). The zoom_ratio is zoom ratio information of the camera (imaging unit).

(34) A capture_device_id is a capture device ID. A capture_view_id is a capture view ID. A Server_url is the URL of its own server device.

(35) Furthermore, an SEI message is used for inserting information into the layer of the video stream, for example. A Location_Capture_information SEI message, which is to be newly defined, is placed in a SEI message group.

(36) FIG. 5 illustrates an exemplary structure (Syntax) of the Location_Capture_information SEI message. The eight-bit field of a byte_length indicates a subsequent size (number of bytes) of this SEI message.

(37) Each 16-bit field of the latitude, longitude, elevation, and time indicates the GPS data. Here, the latitude field indicates latitude, the longitude field indicates longitude, the elevation field indicates altitude, and the time field indicates time.

(38) Each 16-bit field of the direction, H_angle, V_angle, focus_distance, and zoom_ratio indicates capture information. Here, the direction field indicates a direction such as north, south, east, and west of the camera (imaging unit). The H_angle field indicates a horizontal angle in a case where the above-described direction of the camera (imaging unit) is set as a reference (zero degrees). The V_angle field indicates a vertical angle in a case where the horizontal position of the camera (imaging unit) is set as a reference (zero degrees). The focus_distance field indicates the focus distance of the camera (imaging unit). The zoom_ratio field indicates a zoom ratio of the camera (imaging unit).

(39) The 16-bit field of the capture_device_id indicates the capture device ID. The 16-bit field of the capture_view_id indicates the capture view ID. The eight-bit field of a server_url_text_length indicates a length (size) of a character code portion indicating a character string of a subsequent URL in the number of bytes. The character code is placed in a url_text field.

(40) Furthermore, the media capture/location server 101 receives, from the client device 200, control information (control meta information) for controlling the imaging state of the camera (imaging unit), and controls the imaging state of the camera (imaging unit) on the basis of the control information. Accordingly, the imaging state of the camera (imaging unit) of the media capture/location server 101 can be controlled with the client device.

(41) FIG. 6 illustrates an exemplary structure of the control information (control meta information), and FIG. 7 illustrates contents (semantics) of main information in the exemplary structure. The eight-bit field of the byte_length indicates a subsequent size (number of bytes) of this control information.

(42) The one-bit field of a ROI_pointing_flag is flag information indicating that information regarding a rectangular area of an object to be tracked (region of interest: ROI) exists. The one-bit field of a view_angle_shift_flag is flag information indicating that information regarding a view angle shift exists. The one-bit field of a focus_distance_flag is flag information indicating that information regarding the focus distance exists. The one-bit field of a switch_view_flag is flag information indicating that the capture view ID exists. The one-bit field of a zoom_control_flag is flag information indicating that zoom control information exists.

(43) The eight-bit fields of the capture_device_ID and the capture_view_ID indicate the capture device ID and the capture view ID, respectively, which are identified by the client device 200 on the basis of insertion information of the layer of the video stream and/or the layer of the container (lcib box information and Location_Capture_information SEI message).

(44) In a case where the ROI_pointing flag is 1, each 16-bit field of a ROI_area_top_left_x, ROI_area_top_left_y, ROI_area_bottom_right_x, and ROI_area_bottom_right_y exists. The ROI_area_top_left_x field indicates an x-coordinate of a top-left of the rectangular area of the object to be tracked (ROI). The ROI_area_top_left_y field indicates a y-coordinate of the top-left of the rectangular area of the object to be tracked (ROI). The ROI_area_bottom_right_x field indicates an x-coordinate of a bottom-right of the rectangular area of the object to be tracked (ROI). The ROI_area_bottom_right_y field indicates a y-coordinate of the bottom-right of the rectangular area of the object to be tracked (ROI).

(45) In a case where information regarding the rectangular area of the object to be tracked (ROI) exists in the control information (control meta information) received from the client device 200, the media capture/location server 101 controls the imaging state (mainly, direction and angle state) such that the camera (imaging unit) performs tracking (automatic adaptation) of the object (ROI).

(46) FIG. 8 illustrates an exemplary tracking. FIG. 8(a) schematically illustrates an image displayed on a display of the client device 200. In the client device 200, in a case where a user specifies a rectangular area ER surrounding the object to be tracked (ROI) on the display as illustrated in the drawing, the control information (control meta information) having information regarding the rectangular area ER is transmitted from the client device 200 to the media capture/location server 101.

(47) It is assumed that the object to be tracked (ROI) is moving in, for example, the moving direction illustrated in the drawing. In this case, as illustrated in FIG. 8(b), the media capture/location server 101 divides the object to be tracked (ROI) after the movement into blocks, performs motion determination by motion prediction in each block, obtains a global motion vector (moving direction and moving amount) from the motion determination result in each block, and corrects an orientation of the camera (imaging unit) on the basis of the motion vector information.

(48) The media capture/location server 101 repeatedly performs such operation of correcting the orientation of the camera (imaging unit). As a result, the orientation of the camera (imaging unit) is corrected as necessary as the object to be tracked (ROI) moves so that the object to be tracked (ROI) is constantly included in the image, thereby performing the tracking (automatic adaptation).

(49) Returning to FIG. 6, in a case where the view_angle_shift_flag is 1, 16-bit fields of a Right_shift_degree, Left_shift_degree, Upper_shift_degree, and Lower_shift_degree exist. The Right_shift_degree field indicates a frequency of changing the orientation of the camera (imaging unit) rightward as viewed from the camera position. The Left_shift_degree field indicates a frequency of changing the orientation of the camera (imaging unit) leftward as viewed from the camera position. The Upper_shift_degree field indicates a frequency of changing the orientation of the camera (imaging unit) upward as viewed from the camera position. The Lower_shift_degree field indicates a frequency of changing the orientation of the camera (imaging unit) downward as viewed from the camera position.

(50) In a case where information regarding the view angle shift exists in the control information (control meta information) received from the client device 200, the media capture/location server 101 controls the angle of the camera (imaging unit) on the basis of the information. For example, FIG. 9 illustrates a case where X information is provided as the Right_shift_degree while the camera (imaging unit) is capturing an image of a Shot 1 portion in the entire subject in a certain imaging state so that the orientation of the camera (imaging unit) is changed rightward by X degrees and is controlled to capture an image of a Shot 2 portion.

(51) In a case where the focus_distance_flag is 1, a 16-bit field of the Focus_distance exists. The Focus_distance field indicates the focus distance in a meter unit. In a case where information regarding the focus distance exists in the control information (control meta information) received from the client device 200, the media capture/location server 101 controls the focus distance of the camera (imaging unit) to correspond to the information.

(52) In a case where the switch_view_flag is 1, a 16-bit field of a switch_view exists. The switch_view field indicates the capture view ID for changing the imaging state of the camera (imaging unit). As described above, the capture view ID is the identification information for identifying a combination of the capture information and the position information. The switch_view can be specified in a case where the client device 200 explicitly indicates the capture view ID (capture_view_id) after switching. In a case where the capture view ID exists in the control information (control meta information) received from the client device 200, the media capture/location server 101 controls the imaging state of the camera (imaging unit) to correspond to the capture information indicated by the capture view ID.

(53) In a case where the zoom_control_flag is 1, 16-bit fields of a zoom_in_degree and zoom out degree exist. The zoom_in_degree field indicates an x value of an enlargement ratio (1+x/100). The zoom_out_degree field indicates an x value of a reduction ratio (1x/100). In a case where enlargement/reduction information regarding the zoom ratio exists in the control information (control meta information) received from the client device 200, the media capture/location server 101 controls the zoom ratio of the camera (imaging unit) to correspond to the enlargement/reduction information regarding the zoom ratio.

(54) Returning to FIG. 1, the initial access server 102 is a server device being accessed by the client device 200 first, which is for obtaining the URL of the media capture/location server 101 that the client device 200 should first access. Note that, in a case where the client device 200 identifies the URL of the media capture/location server 101 to be accessed first in advance, the initial access server 102 may not be provided.

(55) In a case where there is an access from the client device 200, the initial access server 102 transmits, to the client device 200, a media presentation description (MPD) file including the URL information regarding the media capture/location server 101 to be accessed first.

(56) FIG. 10 illustrates an exemplary description of the MPD file. In this case, an exemplary case where only information associated with the video stream is written is illustrated to simplify descriptions. However, in practice, information associated with media streams other than the video stream is also written.

(57) By the description of <AdaptationSet mimeType=video/mp4 codecs=hevl.xx.xx.L153, it is indicated that an adaptation set (AdaptationSet) for the video stream exists, the video stream is supplied in the MP4 file structure, and coded image data of the HEVC of a 153 level exists.

(58) By the description of <SupplementaryDescriptor schemeIdUri=urn:rmtcst:video:captureinformation direction=SE H_angle=10 V_angle=15 focus_distance=5/>, it is indicated that the direction is SE, the horizontal angle is 10 degrees, the vertical angle is 15 degrees, and the focus distance is 5 m.

(59) By the description of <SupplementaryDescriptor schemeIdUri=urn:rmtcst:video:captureserve capturedevice_id=1 captureview_id=1/>, it is indicated that the capture device ID is 1, and the capture view ID is 1. By the description of <SupplementaryDescriptor schemeIdUri=urn:rmtcst:video:captureserverurl serverurl/>, the URL of the media capture/location server 101 to be accessed first is indicated by the serverurl.

(60) In this MPD file, a Representation of the video stream indicated by the representation ID (Representation id=1) exists. By the description of <BaseURL> videostream.mp4</BaseURL>, a location of the video stream is indicated as videostream.mp4.

(61) In the SubRepresentation in this Representation, the descriptions width=3840 height=2160 frameRate=60, codecs=hevl.xx.xx.L153,xx, level=0 exist. With the descriptions, it is indicated that a stream of 4K 60P is implemented, the level 0 is applied as tag information, and the level of the coded image data of this image is 153.

(62) Returning to FIG. 1, the GPS_URL mapping server 103 converts the GPS data transmitted from the client device 200 into the URL of the corresponding media capture/location server 101, and accesses, instead of the client device 200, the media capture/location server 101 to which the client device 200 is connected using the URL.

(63) As described later, in a case where the client device 200 attempts to switch a connection to another media capture/location server 101 at a position in a desired direction and distance from the position while being connected to a certain media capture/location server 101, GPS data in the vicinity of the position of the other media capture/location server 101 is calculated, and the GPS data is transmitted to the GPS_URL mapping server 103.

(64) Note that, in a case where the client device 200 has a function of converting the GPS data into the URL of the corresponding media capture/location server 101, the GPS URL mapping server 103 may not be provided. In this case, the client device 200 directly accesses the media capture/location server 101 to which the client device 200 is connected using the URL obtained by the conversion.

(65) The client device 200 accesses the initial access server 102, and obtains the URL of the media capture/location server 101 to be accessed first. For example, the client device 200 receives, from the initial access server 102, an MPD file (see FIG. 10) in which the URL of the media capture/location server 101 to be accessed first is written.

(66) The client device 200 accesses the media capture/location server 101 on the basis of the URL of the media capture/location server 101 to be accessed first. Then, the client device 200 receives, from the media capture/location server 101 via the network, the container in the predetermined format including the video stream configured by the image data obtained by the imaging using the imaging unit, that is, the MP4 distribution stream in the present embodiment.

(67) The client device 200 extracts the video stream from the MP4 distribution stream, processes the video stream to obtain display image data, and displays an image based on the display image data (captured image by the media capture/location server 101) on the display.

(68) The client device 200 identifies, from the insertion information of the layer of the video stream and/or the layer of the container (lcib box information and Location_Capture_information SEI message), the imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit) in the media capture/location server 101. Then, the client device 200 can display, on the basis of a user operation, a user interface screen (UI screen) that shows a current imaging state and has a changing operation unit of the imaging state on the display.

(69) FIG. 11 illustrates an exemplary UI screen. The arrow P indicates a zoom ratio display/adjustment unit for displaying a zoom ratio (Zoom_ratio) and adjustment thereof. In the example in the drawing, a current zoom ratio is indicated at the boundary between black and white. For example, the user can perform zoom-in operation by tapping the portion of + with a finger, and can perform zoom-out operation by tapping the portion of with the finger. Furthermore, for example, the user can perform zoom ratio adjustment operation by changing the boundary position based on an operation of dragging the finger at the boundary position.

(70) The arrow Q indicates an adjustment unit for adjusting the horizontal angle (H_degree), the vertical angle (V_degree), and the focus distance (Focus_distance). In the example in the drawing, with respect to the horizontal angle (H_degree), the user can change the orientation of the camera (imaging unit) rightward as viewed from the camera position by tapping the portion of + with the finger, and can change the orientation of the camera (imaging unit) leftward as viewed from the camera position by tapping the portion of with the finger.

(71) Further, with respect to the vertical angle (V_degree), the user can change the orientation of the camera (imaging unit) upward as viewed from the camera position by tapping the portion of + with the finger, and can change the orientation of the camera (imaging unit) downward as viewed from the camera position by tapping the portion of with the finger. Furthermore, with respect to the focus distance (Focus_distance), the user can lengthen the focus distance by tapping the portion of + with the finger, and can shorten the focus distance by tapping the portion of with the finger.

(72) The arrow R indicates a display unit for displaying the horizontal angle (H_degree), the vertical degree (V_degree), and the focus distance (Focus_distance). In the example in the drawing, the orientation of the camera (imaging unit) is illustrated in the direction of the thick black arrow, the horizontal angle is in the direction deviated rightward from the north (N) direction by degrees, and the vertical angle is in the direction deviated upward from the horizontal position by degrees. Furthermore, in the example in the drawing, the focus distance of the camera (imaging unit) is indicated by a length of the thick black arrow.

(73) Note that, in the exemplary UI screen in FIG. 11, only the portion for displaying and adjusting the imaging state is illustrated to simplify the drawing. In response to the selection operation made by the user, this display may be in a state of being superimposed on the image (captured image by the media capture/location server 101).

(74) In a case where the changing operation of the zoom ratio (Zoom_ratio) is performed, the zoom_control_flag is set to 1, and 16-bit fields of the zoom_in_degree and the zoom_out_degree exist in the control information (control meta information) transmitted from the client device 200 to the media capture/location server 101 (see FIG. 6).

(75) Further, in a case where the changing operation of the horizontal angle (H_degree) and the vertical angle (V_degree) is performed, the view_angle_shift_flag is set to 1, and 16-bit fields of the Right_shift_degree, Left_shift_degree, Upper_shift_degree, and Lower_shift_degree exist in the control information (control meta information) transmitted from the client device 200 to the media capture/location server 101 (see FIG. 6).

(76) Furthermore, in a case where the changing operation of the focus distance (Focus_distance) is performed, the focus_distance_flag is set to 1, and the 16-bit field of the Focus_distance exists in the control information (control meta information) transmitted from the client device 200 to the media capture/location server 101 (see FIG. 6).

(77) Note that, in the descriptions above, it is indicated that the user can change each element of the imaging state of the camera (imaging unit) with reference to the UI screen. However, with the client device 200, the user can also collectively change the respective elements of the imaging state by specifying the capture view ID.

(78) For example, in a case where the client device 200 accesses the media capture/location server 101 to receive the MP4 distribution stream including the video stream via the network, in response to the user operation, the capture view ID indicating the imaging state together with the capture device ID can be stored in such a manner that, for example, text information or thumbnail information for identifying imaging contents is added thereto. In a case where the user specifies the capture view ID and collectively change the respective elements of the imaging state, the capture view ID indicating the desired imaging state can be efficiently performed by referring to the text information, the thumbnail information, and the like.

(79) In a case where the changing operation of the imaging state is performed using the capture device ID, the switch_view_flag is set to 1, and the 16-bit field of the switch_view exists in the control information (control meta information) (see FIG. 6) transmitted from the client device 200 to the media capture/location server 101.

(80) In a case where the user performs the operation for changing the connection to another media capture/location server 101 while being connected to a certain media capture/location server 101, the client device 200 accesses the other media capture/location server 101. In this case, the user can specify the other media capture/location server 101 with the capture device ID. In this case, the client device 200 accesses the other media capture/location server 101 using the URL in association with the capture device ID on a one-to-one basis.

(81) Furthermore, the user can specify another media capture/location server 101 with information regarding a direction and a distance relative to a certain media capture/location server 101. In this case, the client device 200 calculates GPS data in the vicinity of the position of the other media capture/location server 101 on the basis of GPS data of a certain media capture/location server 101 and the information regarding the direction and the distance, and transmits the GPS data to the GPS_URL mapping server 103.

(82) In this case, the GPS_URL mapping server 103 converts the GPS data transmitted from the client device 200 to the URL of the corresponding media capture/location server 101, and accesses, instead of the client device 200, the media capture/location server 101 to which the client device 200 is connected using the URL.

(83) Note that, in a case where the client device 200 has the function of converting the calculated GPS data into the URL of the corresponding media capture/location server 101, the client device 200 itself convers the GPS data into the URL of the corresponding media capture/location server 101, and directly accesses the media capture/location server 101.

(84) The user can also give information regarding a current orientation of the camera (imaging unit) as direction information and give focus distance information as distance information. FIG. 12 illustrates a state where the camera (imaging unit) is capturing an image of the Shot 2 portion. In this imaging state, the horizontal angle is in the direction deviated by an X1 degree rightward with the direction (Direction) of the camera (imaging unit) as a reference. Note that the vertical angle is at the horizontal position, for example.

(85) In this case, in a case where GPS information of a current point A (atitude 1, longitude 1, elevation 1, time 1) is obtained, GPS information of a point B is calculated as (latitude_new, longitude_new, elevation_new) from the information regarding the direction and the distance. Here, latitude_new=latitude 1+delta_x, longitude_new=longitude 1+delta_y, and elevation_new=elevation 1+delta_e.

(86) FIG. 13 illustrates an exemplary configuration of the media capture/location server 101. The media capture/location server 101 includes a controller 111, a locator 112, a capture 103, a video encoder 114, a container encoder 115, and a network interface 116. Respective units are connected by a bus 117.

(87) The controller 111 controls operation of each part of the media capture/location server 101. The locator 112 receives radio waves from a GPS satellite and obtains position information (longitude/latitude/altitude). The capture 113 is a camera (imaging unit) that captures an image of a subject and obtains image data. The video encoder 114 performs coding such as HEVC on the image data obtained by the capture 113 to obtain encoded image data, and generates a video stream including the coded image data. In this case, the video encoder 114 places the Location_Capture_information SEI message (see FIG. 5) in a SEI message group of the SEIs of an access unit (AU).

(88) In the SEI message, the capture information indicating the imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit), the position information (GPS data) indicating the position of its own server device, the identification information for identifying the combination of the capture information and the position information (capture view ID) and the identification information for identifying its own server device (capture device ID), the URL of its own server device, and the like are inserted.

(89) The container encoder 115 generates a container including the video stream generated by the video encoder 114, that is, an MP4 stream in this case, as a distribution stream. In this case, the container encoder 115 defines the location capture information box lcib box (see FIG. 3) in the udta box defined in the initialization segment (IS) or the moof box.

(90) In this box, the capture information indicating the imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit), the position information (GPS data) indicating the position of its own server device, the identification information for identifying the combination of the capture information and the position information (capture view ID) and the identification information for identifying its own server device (capture device ID), the URL of its own server device, and the like are inserted.

(91) Note that the placement of the Location_Capture_information SEI message and the placement of the location capture information box are not necessarily required both, and either one of them may be adopted.

(92) The network interface 116 communicates with, via the network, the client device 200, and also with another server as necessary. For example, the network interface 116 transmits the distribution stream obtained by the container encoder 115 to the client device 200. Furthermore, for example, the network interface 116 receives the control information (control meta information) (see FIG. 6) transmitted from the client device 200. In this case, the controller 111 controls a state of the capture 113 according to the contents of the control information (control meta information).

(93) FIG. 14 illustrates an exemplary configuration of the client device 200. A controller 211, a user operation unit 212, a message converter 213, a network interface 214, a container decoder 215, a video decoder 216, a video processor 217, and a display 218 are included. Respective units are connected by a bus 219.

(94) The controller 211 controls operation of each part of the client device 200. The user operation unit 212 constitutes a user interface, which is a key, a touch panel, a remote controller, and the like with which the user performs various operations. The network interface 214 communicates with a server such as the media capture/location server 101 via the network.

(95) For example, the network interface 214 receives, from the media capture/location server 101, the MP4 stream including the video stream as a distribution stream. Further, in a case where the user performs operation to change the imaging state of the camera (imaging unit) in the currently connected media capture/location server 101, for example, the network interface 214 transmits the control information (control meta information) (see FIG. 6) to the media capture/location server 101.

(96) Furthermore, in a case where the user performs operation to switch the connection from the currently connected media capture/location server 101 to another media capture/location server 101, the network interface 214 transmits the GPS data to the GPS_URL mapping server 103, or directly accesses the media capture/location server 101.

(97) The container decoder 215 extracts the video stream from the distribution stream (MP4 stream) received by the network interface 214. In this case, the container decoder 215 extracts the location capture information box lcib box existing in the udta box defined in the initialization segment (IS) or the moof box, and transmits it to the controller 211. With this operation, the controller 211 identifies the imaging state (e.g., imaging direction, focus distance, and zoom ratio) and the like of the camera (imaging unit).

(98) The video decoder 216 decodes the video stream extracted by the container decoder 215 to obtain display image data. In this case, the video decoder 216 extracts a parameter set and an SEI message inserted in the video stream, and transmits them to the controller 211. This extracted information also includes the Location_Capture_information SEI message. With this operation, the controller 211 identifies the imaging state (e.g., imaging direction, focus distance, and zoom ratio) and the like of the camera (imaging unit).

(99) The video processor 217 performs processing such as resolution conversion on the display image data obtained by the video decoder 216, and then supplies it to the display 218. The display 218 displays an image (captured image by the media capture/location server 101) based on the display image data obtained by the video decoder 216. The display of the image may also be performed by another display monitor via the interface.

(100) As described above, the controller 211 identifies, for example, the imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit) from the location capture information box and the Location_Capture_information SEI message, whereby the UI screen (see FIG. 11) showing the current imaging state can be displayed on the display 218 under the control of the controller 211.

(101) The message converter 213 generates, under the control of the controller 211, a message in response to the operation of the user operation unit 212, for example, and transmits it to the network interface 214. For example, in a case where the user performs operation to change the imaging state of the camera (imaging unit) in the currently connected media capture/location server 101, for example, the message converter 213 generates, under the control of the controller 211, the control information (control meta information) (see FIG. 6), and transmits it to the network interface 214.

(102) Furthermore, for example, in a case where the user performs operation to switch the connection from the currently connected media capture/location server 101 to another media capture/location server 101, the message converter 213 generates a first message including the URL information regarding the other media capture/location server 101 or a second message including GPS data for obtaining the URL of the other media capture/location server 101 different from the URL information regarding the GPS_URL mapping server 103, and transmits it to the network interface 214.

(103) FIG. 15 illustrates an exemplary structure of a server access message (Server_access_message) as a first message. The eight-bit field of a byte_length indicates a subsequent size (number of bytes) of this SEI message. The eight-bit field of a server_url_text_length indicates a length (size) of a character code portion indicating a character string of a subsequent URL in the number of bytes. The character code is placed in a url_text field.

(104) FIG. 16 illustrates an exemplary structure of the server access message (Server_access_message) as a second message. The eight-bit field of a byte_length indicates a subsequent size (number of bytes) of this SEI message. GPS data includes a latitude_new, longitude_new, and elevation_new. The latitude_new is latitude data, the longitude_new is longitude data, and the elevation_new is altitude data. The eight-bit field of a server_url_text_length indicates a length (size) of a character code portion indicating a character string of a subsequent URL in the number of bytes. The character code is placed in a url_text field.

(105) A sequence diagram in FIG. 17 illustrates exemplary communication between the client device 200 and each server. (1) First, the client device 200 accesses the initial access server 102. Note that the client device 200 is assumed to identify the URL of the initial access server 102 in advance. (2) The initial access server 102 transmits, to the client device 200, an MPD file including the URL information regarding the media capture/location server 101 to be accessed first.

(106) (3) The client device 200 accesses the media capture/location server (capture server) 101 to be accessed first on the basis of the URL information included in the MPD file, and enters a connected state. (4) The media capture/location server 101 transmits, to the client device 200, the distribution stream including the video stream of the captured image data. With this operation, the client device 200 enters a state of displaying the captured image in the media capture/location server 101.

(107) (5) The client device 200 transmits the control information (control meta information) to the media capture/location server 101 in response to an imaging state changing operation made by the user. Accordingly, the imaging state in the media capture/location server 101 is changed. (6) The media capture/location server 101 transmits, to the client device 200, the distribution stream including the video stream of the captured image data after the imaging state change. With this operation, the client device 200 enters a state of displaying the captured image after the imaging state change in the media capture/location server 101.

(108) (7) The client device 200 accesses another media capture/location server 101 directly or via the GPS URL mapping server 103 in response to an operation for changing the connection to the other media capture/location server 101 made by the user, and enters the connected state. (8) The media capture/location server 101 transmits, to the client device 200, the distribution stream including the video stream of the captured image data. With this operation, the client device 200 enters a state of displaying the captured image in the other media capture/location server 101.

(109) (9) The client device 200 transmits the control information (control meta information) to the other media capture/location server 101 in response to the imaging state changing operation made by the user. Accordingly, the imaging state in the other media capture/location server 101 is changed. (10) The media capture/location server 101 transmits, to the client device 200, the distribution stream including the video stream of the captured image data after the imaging state change. With this operation, the client device 200 enters a state of displaying the captured image after the imaging state change in the other media capture/location server 101.

(110) As described above, in the server system 10 illustrated in FIG. 1, the media capture/location server 101 inserts the capture information indicating the imaging state of the camera (imaging unit) into the layer of the video stream and/or the layer of the container, and transmits the same. Therefore, the client device 200 can easily grasp the imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit) in the media capture/location server 101 on the basis of the capture information, whereby the imaging state of the media capture/location server 101 can be appropriately controlled.

(111) Furthermore, in the server system 10 illustrated in FIG. 1, the user can specify another media capture/location server 101 with information regarding a direction and a distance relative to the currently connected media capture/location server 101 using the client device 200. Therefore, even in a case where the URL of the media capture/location server 101 to be switched is unknown, switching to another media capture/location server 101 can be appropriately performed.

2. Second Embodiment

(112) [Exemplary Configuration of Server System]

(113) FIG. 18 illustrates an exemplary configuration of a server system 30 according to a second embodiment. The server system 30 includes a plurality of server devices and a client device 400, which are mutually connected via a network.

(114) The plurality of server devices include a plurality of media capture/location servers 301, an initial access server 302, a thumbnail server 303, and a storage server 304. In a case where there is an access to the storage server 304 from, for example, the storage server 304 or automatically, the media capture/location server 301 transmits, via the network, a container in a predetermined format including a video stream configured by image data obtained by imaging using a camera (imaging unit), that is, an MP4 distribution stream in the present embodiment. Although a case of one storage server 304 is illustrated in the example in the drawing, a plurality of storage servers 304 may exist.

(115) Although detailed descriptions are omitted, the media capture/location server 301 is configured in a similar manner to the media capture/location server 101 (see FIG. 13) according to the first embodiment described above. The storage server 304 stores a video stream (video file) transmitted from each media capture/location server 301 with a combination of a capture view ID and a capture device ID serving as identification information.

(116) The initial access server 302 is a server device for obtaining information regarding a plurality of video streams stored in the storage server 304. In a case where there is an access from the client device 400, the initial access server 302 transmits, to the client device 400, a media presentation description (MPD) file including the information regarding the plurality of video streams.

(117) This MPD file includes, as the information regarding the plurality of video streams, the combination of the capture device ID and the capture view ID as identification information of each video stream, and an URL of a thumbnail server for obtaining thumbnail image data of each video stream.

(118) FIG. 19 illustrates an exemplary description of the MPD file. In this case, an exemplary case where only information associated with the video stream is written is illustrated to simplify descriptions. However, in practice, information associated with media streams other than the video stream is also written.

(119) There are information regarding a first period (Period 1) and information regarding a second period (Period 2). In the first period (Period 1), by the description of <AdaptationSet mimeType=video/mp4 codecs=hevl.xx.xx.L153, it is indicated that an adaptation set (AdaptationSet) for the video stream exists, the video stream is supplied in the MP4 file structure, and coded image data of the HEVC of a 153 level exists.

(120) By the description of <BaseURL>thumbnail_server_URL1 </BaseURL>, it is indicated that a location of thumbnail image data of each video stream stored in the storage server 304 identified by the following combination of the capture device ID and the capture view ID is thumbnail_server_URL1. In the thumbnail server indicated by the thumbnail_server_URL1, individual thumbnail images are managed to correspond to the capture device IDs and the capture view IDs.

(121) Furthermore, in the second period (Period 2), by the description of <AdaptationSet mimeType=video/mp4 codecs=hevl.xx.xx.L153, it is indicated that the adaptation set (AdaptationSet) for the video stream exists, the video stream is supplied in the MP4 file structure, and the coded image data of the HEVC of the 153 level exists.

(122) By the description of <BaseURL>thumbnail_server_URL2 </BaseURL>, it is indicated that the location of the thumbnail image data of each video stream stored in the storage server 304 identified by the following combination of the capture device ID and the capture view ID is thumbnail_server_URL2. In the thumbnail server indicated by the thumbnail_server_URL2, individual thumbnail images are managed to correspond to the capture device IDs and the capture view IDs. Note that the location of the thumbnail image data in the first period (Period 1) and the second period (Period 2) may be the same.

(123) Returning to FIG. 18, the thumbnail server 303 holds one or more thumbnail image data of each video stream stored in the storage server 304. In a case where there is an access from the client device 400, the thumbnail server 303 transmits the held thumbnail image data to the client device 400. Note that, although a case of one thumbnail server 303 is illustrated in the example in the drawing, a plurality of thumbnail servers 303 may exist.

(124) Although detailed descriptions are omitted, the client device 400 is configured in a similar manner to the client device 200 (see FIG. 14) according to the first embodiment described above. The client device 400 accesses the initial access server 302, and obtains the information regarding the plurality of video streams stored in the storage server 304. This client device 400 receives, from the initial access server 302, for example, the MPD file in which the combination of the capture device ID and the capture view ID as the identification information of each video stream and the URL of the thumbnail server for obtaining the thumbnail image data of each video stream are written (see FIG. 19).

(125) The client device 400 accesses the thumbnail server 303 on the basis of the URL of the thumbnail server. Then, the client device 400 receives, from the thumbnail server 303 via the network, the thumbnail image data of the plurality of video streams stored in the storage server 304. Note that information regarding an imaging start time and an imaging end time of each video stream may be added to each thumbnail image data.

(126) The client device 200 displays a thumbnail image for convenience of selection of a reproduction video stream by the user. At this time, the imaging start time and the imaging end time may be displayed in association with each thumbnail image so that the user can not only select the reproduction video stream but also specify a reproduction start position thereof.

(127) In a case where the reproduction video stream is selected by a user operation, the client device 400 transmits, to the storage server 304, a reproduction request including the combination of the capture device ID and the capture view ID for identifying the reproduction video stream. Then, the client device 400 receives, from the storage server 304, an MP4 distribution stream including the selected reproduction video stream via the network. Note that, in a case where the reproduction start position is also specified by the user operation, the reception from the storage server 304 starts at the reproduction start position.

(128) The client device 400 extracts the video stream from the MP4 distribution stream, processes the video stream to obtain display image data, and displays an image based on the display image data (captured image by the media capture/location server 301) on the display.

(129) FIG. 20 illustrates an exemplary configuration of the storage server 304. This storage server 304 includes a controller 311, a storage 312, a container encoder 313, and a network interface 314. Respective units are connected by a bus 315.

(130) The controller 311 controls operation of each part of the storage server 304. The storage 312 includes a hard disk drive (HDD), a semiconductor memory, and the like. The storage 312 stores a video stream (video file) transmitted from each media capture/location server 301 with a combination of a capture view ID and a capture device ID serving as identification information.

(131) In a case where the reproduction request in which the video stream to be reproduced is specified by the combination of the capture device ID and the capture view ID is received from the client device 400, the controller 311 reads the video stream (video file) identified by the combination of the capture device ID and the capture view ID from the storage 312, and transmits it to the container encoder 313. Note that, in a case where the specification of the reproduction start position is also included in the reproduction request, the reading starts at the reproduction start position.

(132) The container encoder 313 generates a container including the video stream read from the storage 312, that is, an MP4 stream in this case, as a distribution stream. The network interface 314 communicates with, via the network, the client device 400, and also with another server as necessary.

(133) For example, the network interface 314 receives a video stream transmitted from each media capture/location server 301. Further, for example, the network interface 314 transmits the distribution stream obtained by the container encoder 313 to the client device 400. Furthermore, for example, the network interface 314 receives the reproduction request transmitted from the client device 400.

(134) A sequence diagram in FIG. 21 illustrates exemplary communication between the client device 400 and each server. (11) First, the client device 400 accesses the initial access server 302. Note that the client device 400 is assumed to identify the URL of the initial access server 302 in advance. (12) In a case where there is an access from the client device 400, the initial access server 302 transmits, to the client device 400, the MPD file including the information regarding the plurality of video streams.

(135) (13) The client device 400 accesses the thumbnail server 303 on the basis of the URL information regarding the thumbnail server included in the MPD file, and enters the connected state. (14) The thumbnail server 303 transmits the thumbnail image data to the client device 400. Accordingly, the client device 400 enters a state of displaying the thumbnail image data of the plurality of video streams stored in the storage server 304 on the display for convenience of selection of the reproduction video stream.

(136) (15) In a case where the reproduction video stream is selected by the user operation, the client device 400 transmits, to the storage server 304, the combination of the capture device ID and the capture view ID for identifying the reproduction video stream. (16) Then, the client device 400 receives, from the storage server 304, the MP4 distribution stream including the selected reproduction video stream via the network. Accordingly, the client device 400 enters a state of displaying the image based on the reproduction video stream.

(137) As described above, in the server system 30 illustrated in FIG. 18, in a case where the reproduction video stream is selected by the user operation, the client device 400 transmits, to the storage server 304, the reproduction request including the combination of the capture device ID and the capture view ID for identifying the reproduction video stream. Therefore, the client device 400 can appropriately receive a desired video stream from the storage server 304.

(138) Furthermore, in the server system 30 illustrated in FIG. 18, the user can select the reproduction video stream on the basis of the thumbnail image using the client device 400, and the selection of the reproduction video stream can be easily and appropriately performed.

3. Variation

(139) Note that the example in which the container is the MP4 (ISOBMFF) has been described in the embodiments described above. However, the container is not limited to the MP4 in the present technology, and the present technology can be similarly applied to a case where the container is in other formats such as MPEG-2 TS and MMT.

(140) Furthermore, the present technology can also take the following configurations.

(141) (1) A server device, including:

(142) an imaging unit that captures an image of a subject to obtain image data; and

(143) a transmission unit that transmits a container in a predetermined format including a video stream configured by the image data to a client device, in which

(144) capture information indicating an imaging state in the imaging unit is inserted in a layer of the video stream and/or a layer of the container.

(145) (2) The server device according to (1), in which

(146) the capture information includes information regarding an imaging direction of the imaging unit.

(147) (3) The server device according to (1) or (2), in which

(148) the capture information includes information regarding a focus distance of the imaging unit.

(149) (4) The server device according to any one of (1) to (3), in which

(150) the capture information includes information regarding a zoom ratio of the imaging unit.

(151) (5) The server device according to any one of (1) to (4), in which

(152) position information indicating a position of the server device of its own is further inserted in the layer of the video stream and/or the layer of the container.

(153) (6) The server device according to (5), in which the position information is GPS data.

(154) (7) The server device according to (5) or (6), in which

(155) identification information for identifying a combination of the capture information and the position information and identification information for identifying the server device of its own are further inserted in the layer of the video stream and/or the layer of the container.

(156) (8) The server device according to any one of (1) to (7), further including:

(157) a reception unit that receives control information for controlling the imaging state in the imaging unit from the client device; and

(158) a control unit that controls the imaging state of the imaging unit on the basis of the control information.

(159) (9) A method of transmission processing of a server device, including:

(160) an imaging step of capturing an image of a subject to obtain image data with an imaging unit; and

(161) a transmitting step of transmitting, to a client device, a container in a predetermined format including a video stream configured by the image data using a transmission unit, in which

(162) capture information indicating an imaging state in the imaging unit is inserted in a layer of the video stream and/or a layer of the container.

(163) (10) A client device, including:

(164) a reception unit that receives a container in a predetermined format including a video stream from a server device,

(165) the video stream including image data obtained by imaging a subject with an imaging unit,

(166) capture information indicating an imaging state of the imaging unit being inserted in a layer of the video stream and/or a layer of the container,

(167) the client device further including:

(168) a control unit that controls processing for obtaining display image data from the video stream, processing for displaying the imaging state of the imaging unit on the basis of the capture information, processing for obtaining control information regarding the imaging state of the imaging unit in response to a user operation using a display of the imaging state, and processing for transmitting the control information to the server device.

(169) (11) The client device according to (10), in which

(170) position information indicating a position of the server device is further inserted in the layer of the video stream and/or the layer of the container, and

(171) the control unit further controls processing for obtaining access information for accessing another server device different from the server device on the basis of the user operation and processing for accessing the other server device on the basis of the access information.

(172) (12) The client device according to (11), in which

(173) the access information is an URL of the other server device.

(174) (13) The client device according to (11), in which

(175) the access information is the position information and an URL of a conversion server that converts the position information into an URL.

(176) (14) A method of reception processing of a client device, the method including:

(177) a receiving step of receiving, from a server device, a container in a predetermined format including a video stream using a reception unit,

(178) the video stream including image data obtained by imaging a subject with an imaging unit,

(179) capture information indicating an imaging state of the imaging unit being inserted in a layer of the video stream and/or a layer of the container,

(180) the method further including:

(181) a controlling step of controlling, with a control unit, processing for obtaining display image data from the video stream, processing for displaying the imaging state of the imaging unit on the basis of the capture information, processing for obtaining control information regarding the imaging state of the imaging unit in response to a user operation using a display of the imaging state, and processing for transmitting the control information to the server device.

(182) (15) A server system in which a plurality of server devices and client devices are mutually connected via a network, in which

(183) the server device includes:

(184) an imaging unit that captures an image of a subject to obtain image data; and

(185) a transmission unit that transmits a container in a predetermined format including a video stream configured by the image data to the client device,

(186) capture information indicating an imaging state in the imaging unit is inserted in a layer of the video stream and/or a layer of the container,

(187) the client device includes:

(188) a reception unit that receives the container in the predetermined format from the server device, and

(189) the client device further includes:

(190) a control unit that controls processing for obtaining display image data from the video stream, processing for displaying the imaging state of the imaging unit on the basis of the capture information, processing for obtaining control information regarding the imaging state of the imaging unit in response to a user operation using a display of the imaging state, and processing for transmitting the control information to the server device.

(191) (16) The server system according to (15), in which

(192) position information indicating a position of the server device is further inserted in the layer of the video stream and/or the layer of the container, and

(193) the control unit further controls processing for obtaining access information for accessing another server device different from the server device on the basis of the user operation and processing for accessing the other server device on the basis of the access information.

(194) (17) A server device, including:

(195) a storage that stores a plurality of video streams identified by capture device identification information for identifying an imaging device and capture view identification information for identifying a combination of a position information of the imaging device and imaging state information of the imaging device;

(196) a reception unit that receives a reproduction request in which a video stream to be reproduced is specified by the capture device identification information and the capture view identification information from a client device; and

(197) a transmission unit that reads the specified video stream from the storage and transmits a container in a predetermined format including the video stream to the client device.

(198) (18) A client device, including:

(199) a transmission unit that transmits, to a server device, a reproduction request in which a video stream to be reproduced is specified by capture device identification information for identifying an imaging device and capture view identification information for identifying a combination of a position information of the imaging device and imaging state information of the imaging device;

(200) a reception unit that receives a container in a predetermined format including a video stream corresponding to the reproduction request from the server device; and

(201) a processing unit that obtains display image data from the video stream included in the container.

(202) The main feature of the present technology is that the media capture/location server inserts the capture information indicating the imaging state of the camera (imaging unit) into the layer of the video stream and/or the layer of the container and transmits the same so that the client device can easily grasp the imaging state (e.g., imaging direction, focus distance, and zoom ratio) of the camera (imaging unit) in the media capture/location server on the basis of the capture information, whereby the imaging state of the media capture/location server can be appropriately controlled (see FIGS. 1, 11, and 13).

REFERENCE SIGNS LIST

(203) 10, 30 Server system 101 Media capture/location server 102 Initial access server 103 GPS_URL mapping server 111 Controller 112 Locator 113 Capture 114 Video encoder 115 Container encoder 116 Network interface 117 Bus 200 Client device 211 Controller 212 User operation unit 213 Message converter 214 Network interface 215 Container decoder 216 Video decoder 217 Video processor 218 Display 219 Bus 301 Media capture/location server 302 Initial server 303 Thumbnail 304 Storage server 311 Controller 312 Storage 313 Container encoder 314 Network interface 315 Bus