In the Versatile Video Coding (VVC), the merge mode lets a decoder to derive a motion vector (MV) autonomously using the motion information of a selected previously inter-coded block. For this purpose, a merge candidate list is maintained by encoder and decoder by filling it up with various candidates. In this process, if the merge candidate list is not sufficiently full, zero motion vector is repetitively added as required, thus, the redundant zero MVs in the merge candidate list may limit diversity of the list, resulting in low coding efficiency. In this paper we propose a new method of adding up additional merge candidates based on pairwise average of merge candidates. The additional MVs replacing the redundant zero MVs are obtained by linearly combining the MVs of the first and second candidate in the merge candidate list. The proposed method attains coding gain of 0.01%, 0.04%, and 0.05% respectively of Y, Cb, and Cr channels against VTM-22.0. The result also shows that the candidate list is diversified more by the proposed method, which consequently increases the coding efficiency.
In this paper, we propose an Inverse Tone Mapping Operator (iTMO) method that is computationally efficient by employing reinforcement learning. It is designed to convert existing legacy Low Dynamic Range (LDR) contents to High Dynamic Range (HDR) ones, making them more suitable for HDR display devices. Our proposed iTMO leverages a reinforcement learning agent, enabling end to end transforms of single image content of LDR into HDR one without requiring any domain knowledge or heuristic parameter adjustments from the user.
KEYWORDS: Video coding, High efficiency video coding, Chromium, Video, Modeling, Printed circuit board testing, Displays, Data modeling, Binary data, Video compression
In VVC, the intra prediction mode of a block is encoded using the most probable mode (MPM). For this purpose, possible candidate modes for MPM are maintained in the MPM candidate list. The list is constructed differently depending on the number of neighboring angular intra prediction modes. While it is important to maintain a proper MPM candidate list, but how to encode the selected list index is also important. In this paper, we investigate on context coding of the MPM candidate list index instead of the current bypass-coding in VVC. The proposed method achieves a BDBR gain of 0.08% in the Y channel, and 0.10% and 0.06% respectively in the Cb and Cr channels.
In this paper, we investigate on enhancing the coding performance of DIMD (Decoder-Side Intra Mode Derivation), one of the intra prediction tools under consideration for next generation video coding standard beyond VVC capability, for a screen content video which is often referred to as computer-generated video. We note that the combination of the intra-predictors of the planar mode and the directional modes derived based on the histogram of gradients (HoG) in current DIMD can result in a blurry predictor and it can be specially a problem for screen content video that typically has many sharp edges, and in this context we devise a novel way of generating a sharper intra-predictor by emphasizing dominant directional modes and their neighboring modes in DIMD. Compared to the existing DIMD in the test model, ECM 10.0 under the all intra configuration, we observe coding performance improvements of -0.02%, 0.05%, and -0.02% respectively in the Y, Cb, and Cr channels for the class F, and -0.01%, 0.01%, and -0.09% for the class TGM.
Although VVC standardization is complete, there is a continued exploration of new video coding tools under the name of beyond VVC capability. In this effort, various new techniques such as cross-component or non-cross-component schemes have been developed so far, and among them, the direct block vector (DBV) mode, which is one of the non-cross-component prediction schemes, conducts block copy of the reference block by means of block vector information estimated using colocated luma area. However, in a dual tree structure, there can be multiple luma blocks in the co-located luma area corresponding to a chroma block, and one block vector which is estimated from the co-located luma area may not be optimal for the DBV mode. To overcome this, in this paper, a cross-component-based compensation method is proposed to use a convolution function for a chroma block prediction. Experiment under the AI configuration using the first 50 frames of sequences from JVET common test conditions shows 0.02% BDBR loss in Y channel but BDBR gains of 0.17% and 0.07% respectively in Cb and Cr channel with negligible encoding complexity increase against ECM-9.0.
KEYWORDS: Video coding, Displays, Quantization, Video, Statistical analysis, Video compression, Cameras, Artificial intelligence, Virtual reality, High efficiency video coding
Rate-distortion optimized quantization for transform skip (RDOQ-TS) is shown to provide positive coding gain in VVC. This paper reduces encoding time-complexity of the RDOQ-TS scheme in VVC by adaptively skipping its level estimation (LE) process without sacrificing coding efficiency much. At first, the characteristics of each quantized level by RDOQ-TS are analyzed statistically. Subsequently, based on the statistical analysis, the LE process in the RDOQ-TS process is simplified by removing the case where the SQ (Scalar Quantization) value and the finally determined value are highly expected to be the same. For 4:2:0 screen content videos and 4:4:4 screen content videos, our proposed simplified rate-distortion optimization for transform skip reduces total encoding time respectively by 0.77% and 1.83% with no BDBR change under all intra configuration. In the case of the random access configuration, it achieves bit rate saving by -0.01 % and -0.03 % with total encoding time reduced 0.94% and 0.52% respectively.
Multiple reference line (MRL) intra prediction allows to use reference lines not necessarily immediately adjacent neighboring line or column in generating intra predictor. Compared to MRL already existing in the versatile video coding (VVC) standard which can use a total of three reference lines including the adjacent one, the MRL under discussion now in enhanced compression beyond VVC is extended to use a total of six reference lines. However, its MRL candidates are listed in a fixed order for all blocks without considering non-identical effectiveness of each reference line in intra prediction. In this paper, we propose a reordering scheme of the MRL candidate lines in the MRL list considering similarities between predictors generated by available reference lines. According to our experimental results, the proposed method is shown to achieve an overall coding gain of -0.02% in luma channel over the enhanced compression model (ECM) 5.0 under the all intra (AI) configuration of the common test condition (CTC).
KEYWORDS: Projection systems, Displays, Cameras, Calibration, Visualization, Camera calibration, Image stabilization, Distortion, Matrices, Control systems
In this paper, we investigate a simple movable projection mapping system which can reduce distortion and fluctuation of a projected image by assisting a gimbal device by geometric image transformation. Since a conventional gimbal device can compensate the movement of a mounted projector only, we design a new gimbal device which can compensate the relative geometrical relationship between the projector and the screen as well. The proposed method stabilizes a projected image using the newly designed gimbal which tracks the screen with a visual marker placed at each vertex and transforms the shape of image. Through an experiment, we evaluate the performance of the proposed projected image stabilization technique.
KEYWORDS: Video coding, High efficiency video coding, Chromium, Design and modelling, Artificial intelligence, Video compression, Tunable filters, Statistical analysis, Smoothing, Interpolation
Versatile Video Coding (VVC) procured an impressive coding gain with the help of various new coding tools. In the case of intra chroma coding, chroma separate tree (CST) and cross-component linear model (CCLM) show remarkable coding gain. However, there are only 8 prediction mode candidates in intra chroma coding which is much fewer than the 67 prediction mode candidates and up to 32 matrix-weighted intra prediction (MIP) mode candidates in intra luma coding. Excluding the three CCLM modes, the number of intra prediction mode candidates for chroma is just five, which is less than one tenth of luma channel which uses 67 intra prediction mode candidates except the MIP modes. In this paper, we investigate adding one more intra chroma prediction mode whose prediction direction is derived from the neighboring reconstructed area. The proposed method achieves a BDBR gain of 0.36% and 0.26% in the Cb and Cr channels without loss in Y channel under the all intra configuration.
Over the years, many methods have emerged to solve the super-resolution problem of light field images, and among them, those methods based on deep learning are noted quite attractive recently. Although the features extracted from epipolar domain for the super-resolution of light field images are actively investigated due to their potential capability of well capturing the relationship between spatial and angular domains, we note that spatial features are still the most important foundation in feature extraction. In this paper, we design a network, named as LFSelectSR, employing multiple convolutional kernels to fully extract spatial features and introduce a dynamic selection mechanism that can extract the most valuable spatial features. By training and testing the network using well-known datasets, we demonstrate its excellent performance of achieving the level of state-of-the-arts under certain conditions.
In the perspective of alleviating the inherent trade-off between the spatial and angular resolutions of light field (LF) images, much research has been carried out to increase the angular resolution of LFs by synthesizing intermediate views. Since the height of each EPI is equal to the angular resolution of LF, we tackle the view synthesis problem as doubling the height of each EPI in LF. To efficiently stretch the EPI while not consuming too much computing time, we propose to first segment the EPI into superpixels and then adaptively interpolate each superpixel separately. The test results on the synthetic and real-scene LF datasets show that our scheme can achieve average Peak signal-to-noise ratio (PSNR) / structural similarity index measure (SSIM) around 30.58dB / 0.9131 and 32.28dB / 0.9510, by taking computing time of 5.80 minute and 1.83 minute for HCI and EPFL dataset, respectively.
Light field (LF) image can improve perspective effect and immersive experience. It also provides new capabilities such as depth estimation, post-capture refocusing, and 3D modelling among which refocusing is potentially very useful in many applications such as smart phone. This paper addresses a novel depth-guided enhancement of light field images to improve depth contrast. The proposed method is formulated as an optimization problem that can consider not only the desired inter-depth contrast constraints of depth map, but also intra-depth luminance intensity contrast and color saturation contrast constraints in each depth layer. Experiment result shows that it outperforms the state-of-the-art methods in its depth contrast and detail preservation in each depth layer. It is observed also that the proposed method can enlarge luminance range by 7~15% and improve color contrast by 3~8%.
The performance of coded exposure photography-based image deblurring highly depends on its coded pattern to use. Conventionally, the length of the coded pattern has been optimized under the assumption that its length is equal to the length of motion blur. However, coded patterns of different lengths from the same motion blur may have better invertibility than the conventional patterns. In this paper, we investigate a method to optimize the coded pattern within an extended range of length candidates. We demonstrate the effectiveness of the proposed method using a real dataset.
In this paper, we propose a new coding method, called combined CCLM and intra prediction (CCIP) to improve the prediction efficiency of the cross component linear model (CCLM) in versatile video coding (VVC) standard for chroma. While the CCLM technique in VVC can use the linear correlation between the current chroma block and its co-located reconstructed luma region, it cannot take into account the correlation existing between the current chroma block and its adjacent chroma blocks in prediction. The proposed CCIP overcomes this shortcoming by combining two predictors, respectively, generated by conventional CCLM models and by intra prediction using its spatial chroma reference sample. Our experimental results show BDBR of 0.06% in Y component, -0.60% in Cb, and -0.54% in Cr in 4:2:0 color format, and -0.09% in Y component, -0.33% in Cb, -0.42% in Cr in 4:4:4 color format.
Versatile video coding (VVC) has the intra block copy (IBC) coding tool for intra prediction. It uses block vector (BV) of resolution in either 1-pel or 4-pel accuracy to indicate a reference block in the current picture. However, a block vector not expressed in sub-pel accuracy might have limitation in accurately locating a reference block. In this context, we investigate representing BV in sub-pel resolution of half-pel and quarter-pel accuracy. According to our study, in case of camera-captured video contents, 18.96% and 28.63% of BVs prefer to be estimated in half-pel and quarter-pel resolutions respectively if sub-pel accuracy is allowed for the IBC block vector. Regarding the block vector difference (BVD), 8.72% and 9.98% of BVs choose to be signaled using nonzero BVD in half-pel and quarter-pel, respectively. This is comparable to the usage ratio of existing 4-pel resolution of IBC. Also, allowing the half-pel and quarter-pel resolutions of BV brings coding gain in some natural content sequences. Therefore, to further improve coding efficiency of IBC especially in natural video content, sub-pel resolution of BVs can be effective.
In order to alleviate the inherent trade-off relation between spatial and angular resolutions of light field (LF) images, many experiments have been carried out to enhance the angular resolution of LFs by creating novel views. In this paper, we investigate a method to enhance the angular resolution of LF image by first grouping the pixels within and across the multiple views into LF superpixels using existing LF segmentation method, then generating novel views by shifting and overlaying the layers containing the LF superpixels having similar per-view disparity values. Experimental results with synthetic and real-scene datasets show that our method achieves good quality of reconstruction.
Motion blur is easily observable in an image having moving objects, and the coded exposure photography is one of the well-known solutions to its removal. In this paper, we propose a novel coded exposure photography technique to capture images of moving objects using coded flashes. Noting that the camera acquires a multi-channel image at once by using a color filter array, we modulate each channel image differently by using multi-band coded flashes. By modulating the point spread function of the acquired channel image to have a jointly invertible point spread function, its deconvolution performance is significantly improved compared to the existing single image coded exposure photography. We prove the effectiveness of the proposed method through experiments using synthetic and real data.
The CST (Chroma Separate Tree) coding tool in VVC (Versatile Video Coding) allows different partition structures in the luma and chroma channels, and it gives significant coding efficiency for the chroma channel. Under CST, the DM mode can be less efficient when the current chroma block and its co-located luma block have different block shapes and sizes, thus less correlated. To solve this potential problem, we investigate a slightly modified DM mode, so called MPDM (Most Probable Direct Mode), to increase coding efficiency of the chroma intra coding. The proposed method is shown to achieve BDBR gain of -0.03% in the Y channel, -0.25% and -0.24% in the Cb and Cr channels, respectively.
Light field (LF) image is captured by plenoptic cameras which suffers from trade-off between spatial and angular resolutions. Numerous methods have been proposed to enhance the spatial resolution of images captured by LF cameras. Among the state-of-the-art methods, there is an approach to super resolve LF images using the graph-based regularization. However, it has a problem of taking too much time for execution. In this paper, we propose a method to simplify the process in computing graph. The experimental results show that our proposed method can reduce up to 18% of time complexity compared to the original approach while maintaining the image quality of LF images.
Since MPEG and VCEG jointly standardized the H.265/HEVC (High-Efficiency Video Coding) international standard in 2013, they have just completed H.266/VVC (Versatile Video Coding) [1] as the Final Draft International Standard (FDIS) in July 2020 through the Joint Video Experts Team (JVET) of ISO / IEC JTC1 / SC29 / WG11 MPEG and ITU-T SG16 / Q6 VCEG. VVC supports up to 87 intra coding modes including 65 general directional modes and 20 wide angular modes, which is increased more than twice compared to HEVC. VVC can accommodate not only more detailed intra prediction directions, but also so called wide angular modes which make its special sense in non-square coding blocks. The more detailed intra predictions naturally demand more computation in order to determine the most effective intra-prediction mode. In this sense, we investigate how to pre-prune the prediction candidate lists for a fast ISP intra mode decision and propose searching only a small number of prediction modes based on the shape of a block in the rate-distortion optimization (RDO)–based intra mode decision. The proposed method is verified to decrease the intra prediction processing time with only a little increase in the bit rate and a negligible reduction in PSNR values.
Light Field (LF) image/video data provides both spatial and angular information of scene but at the cost of tremendous data volume for their storage and transmission. At the moment, the MPEG Multi-view Video Coding (MVC) is one of promising compression solutions for LF video data, so it deserves much investigation for better prediction structure to effectively reduce the redundancy in LF video data. Several prediction structures have been investigated but only with limited experimental evaluations due to lack of dataset and non-identical testing configurations. This practical problem can be mitigated now by availability of new datasets and common test condition recently proposed by MPEG. As the first step for designing a good compression method for LF video data, in this paper, we evaluate the performance of existing prediction structures for MVC-based LF video coding methods following the MPEG common test condition and its dataset.
High-quality depth estimation from light field (LF) image is an important and challenging task for which many algorithms have been developed so far. While compression is inevitably required in practice for LF data due to its huge data amount, most depth estimation methods have not yet paid sufficient attention to the effect of compression on it. In this paper, we investigate various LF depth estimation methods to design a LF compression method in the context of good depth estimation. By noting that building the data cost is a very first step in most depth estimation algorithms and the data cost computation has a great impact on eventual quality of the depth image, in this paper, we present an in-depth analysis of data cost computation in LF depth estimation problem in the context of compression. Our results show that the data cost building on Epipolar Plane Image (EPI) outperforms other tested methods in this paper and is more robust to compression.
KEYWORDS: Video coding, Video compression, Video, Computer programming, Chromium, High dynamic range imaging, Standards development, Binary data, Computer engineering, Information theory
The Versatile Video Coding (VVC) is a new state-of-the art video compression technology that is being under standardization. It targets for about two times higher coding efficiency against the existing HEVC, supporting HD/UHD/8K video and high dynamic range (HDR) video. It also targets for versatilities such as screen content coding, adaptive resolution change, and independent sub-pictures. To develop an effective coding method for chroma intra prediction mode, in this paper, we investigate its binarization process in CABAC (context adaptive binary arithmetic coding) and test a method which assigns shorter bins to more frequent chroma intra modes and longer bins to the less frequent ones based on the chroma mode statistics.
In order to make a movable display projecting onto high-rise walls or arbitrary objects in physical space as a digital signage, a system, called a drone-projector, has been investigated with a beam projector mounted on a drone. The vibration during hovering caused by the motors of propellers in the drone brings distortion in the projected image. In this paper, we extend an existing sensor-based stabilization method by compensating different scaling of the projected image due to different distance of the drone-projector to the projected surface. Our experimental results show that the distortion of the projected image is made much attenuated by using the proposed stabilization method.
Motion blurs inevitably occur in an image photographing fast moving objects, and its removal, known as motion deblurring, is one of the most well-known ill-posed problems. In this paper, we investigate the deblurring problem of motion blurs by using a modulated external light. Noting that the motion blurs depend both on ambient light and modulated external light, we investigate how to design a motion deblurring method considering not only the external light but also the ambient light. The deblurring performance of the proposed method is compared to that by the conventional method which only considers the external modulated light.
In video-based light field coding, sub-aperture images (SAIs) are ordered to form a pseudo video sequence, and the sequence is encoded by a video compression algorithm, for example, by HEVC. When the size of SAI is not divisible by the minimum size of coding tree unit, proper boundary handling method is required. This paper investigates several boundary handling methods. To maintain high quality of the central SAIs, we combine rotation and u scans to have a new hybrid scan order. Random access configuration is used instead of low-delay one for better coding efficiency. The proposed methods are evaluated with the latest coding tool.
KEYWORDS: Video, Multimedia, Optical filters, Video processing, Optical engineering, Mobile devices, Image processing, Control systems, Distortion, Video coding
Limited computing resources in portable multimedia devices are an obstacle in real-time video decoding of high resolution and/or high quality video contents. Ordinary H.264/AVC video decoders cannot decode video contents that exceed the limits set by their processing resources. However, in many real applications especially on portable devices, a simplified decoding with some acceptable degradation may be desirable instead of just refusing to decode such contents. For this purpose, a complexity-scalable H.264/AVC video decoding scheme is investigated in this paper. First, several simplified methods of decoding tools that have different characteristics are investigated to reduce decoding complexity and consequential degradation of reconstructed video. Then a complexity scalable H.264/AVC decoding scheme is designed by selectively combining effective simplified methods to achieve the minimum degradation. Experimental results with the H.264/AVC main profile bitstream show that its decoding complexity can be scalably controlled, and reduced by up to 44% without subjective quality loss.
The quarter-pel motion vector accuracy supported by H.264/advanced video coding (AVC) in motion estimation (ME) and compensation (MC) provides high compression efficiency. However, it also increases the computational complexity. While various well-known fast integer-pel ME methods are already available, lack of a good, fast subpel ME method results in problems associated with relatively high computational complexity. This paper presents one way of solving the complexity problem of subpel ME by making adaptive motion vector (MV) accuracy decisions in inter-mode selection. The proposed MV accuracy decision is made using inter-mode selection of a macroblock with two decision criteria. Pixels are classified as stationary (and/or homogeneous) or nonstationary (and/or nonhomogeneous). In order to avoid unnecessary interpolation and processing, a proper subpel ME level is chosen among four different combinations, each of which has a different MV accuracy and number of subpel ME iterations based on the classification. Simulation results using an open source x264 software encoder show that without any noticeable degradation (by −0.07 dB on average), the proposed method reduces total encoding time and subpel ME time, respectively, by 51.78% and by 76.49% on average, as compared to the conventional full-pel pixel search.
KEYWORDS: Computer programming, Video coding, Signal processing, Video compression, Optical engineering, Communication engineering, Video, Quantization, Signal attenuation, Scientific research
This paper proposes a motion vector coding scheme which uses the optimal predictive motion vector from the surrounding causal motion vectors in the minimum rate-distortion sense. The signaling overhead for the selected predictive motion vector is reduced by a contradiction testing that operates under a predefined criterion at both encoder and decoder for pruning the candidate predictive motion vectors.
The H.264/AVC deblocking filter pays little attention to intracoded blocks. We enhance this filter by extending it to use intraprediction mode information in its adaptive application to the intracoded block. Experiments show its higher coding efficiency, with blocking artifacts sufficiently minimized in intracoded blocks.
A new motion vector coding method with optimal predictive motion vector selection is proposed. To improve compression performance, the proposed encoder selects an optimal predictive motion vector that produces minimum bits for motion vector coding. The proposed decoder estimates the optimal predictive motion vector without additional information for indicating which predictor is to be used at the encoder side. Experimental results show that compared to the H.264/AVC standard, the proposed scheme improves coding efficiency for various video sequences.
Motion vectors correlate very well with other neighboring motion vectors. Thus, many macroblocks have zero residual motion vectors within their blocks after differential pulse coded modulation using their individually predicted motion vectors. Motivated by this observation, we develop a new joint encoding scheme of motion vectors by defining a new macroblock coding mode called pooled zero motion vector difference coding to jointly code such cases more efficiently. Experimental results with several well-known video test sequences verify that the proposed method improves the coding efficiency up to 6.2% compared to the H.264|advanced video coding (AVC).
In order to achieve high computational performance and low power consumption, modern microprocessors are usually equipped with special multimedia instructions, multi-threading, and/or multi-core processing capabilities. Therefore, parallelizing H.264/AVC algorithm is crucial in implementing real-time encoder on multi-thread
(or -core) processor. Also, there is a significant need for investigation on complexity reduction algorithms such as fast inter mode selection.
Multi-core system makes it possible to uniformly distribute workloads of H.264/AVC over a number of slower and simpler processor cores each consisting of single high performance processor. Therefore, in this paper, we propose a new adaptive slice size selection technique for efficient slice-level parallelism of H.264/AVC encoder on multi-core (or multi-thread) processor using fast inter mode selection as a
pre-processing. The simulation results show that the proposed adaptive slice-level parallelism has a good parallel performance compared to fixed slice size parallelism. The experiment methods and results can be applied to many multi-processor systems for real-time H.264 video encoding.
KEYWORDS: Quantization, Filtering (signal processing), Video coding, Video, Video surveillance, Fourier transforms, Visualization, Video compression, Electronic filtering, Process modeling
The flickering effect is a serious problem of intra-only coding and is caused by different accuracy loss of transform
coefficients by the quantization process from frame to frame. Nevertheless, the study for its solution has never been
sufficient. In this paper, we analyze why flickering effect happens and illustrate our results of observation using intra-only
coding scheme of the H.264/AVC standard. Based on our analysis, we propose a flickering effect reduction scheme
which is a pre-processing method using the Kalman filtering algorithm. Simulation results show that the proposed
scheme increases subjective visual quality by removing flickering effect.
In this paper, we propose the macroblock-level adaptive dynamic resolution conversion (DRC) technique usable by encoder to decide to reduce the resolution of input image on block-by-block basis for better compression efficiency. By reducing the spatial resolution of the block in the proposed scheme, it provides additional compression. As a proper resolution of a block is selected adaptively in the rate-distortion optimized way, more flexible coding is supported to adapt to the feature of image. Simulation based on the state of the art codec H.264 standard demonstrates that the proposed scheme has better performance than H.264 in terms of rate-distortion.
Compared to conventional video standards, the main features of H.264 standard are its high coding efficiency and its network friendliness. In spite of these outstanding merits, it is not easy to implement H.264 codec as a real-time system, due to its requirements of large memory bandwidth and intensive computation. Although the variable-block-size motion compensation using multiple reference frames is one of the key coding tools to bring about its main performance gain, its optimal use demands substantial computation for the rate-distortion calculation of all possible combinations of coding modes and estimation of the best motion vector. Many existing fast motion estimation algorithms are not suitable for H.264, which employs variable motion block sizes. We propose an adaptive motion search scheme utilizing the hierarchical block structure based on the deviation of subblock motion vectors. The proposed fast scheme adjusts the search center and search pattern according to the subblock motion-vector distribution.
In this paper, we propose a multiple description coder for motion vector (MV-MDC) based on data partitioned bitstream of the H.264/AVC standard. The proposed multiple description (MD) encoder separates the motion vector (MV) into two parts having the same priority and transmits each part through an independent packet. The proposed MD decoding scheme utilizes two matching criteria to find the accurate MV estimate when one of the MV descriptions is lost. Simulation results show that compared to simply duplicated bitstream transmission, the proposed MV-MDC scheme reduces a large amount of data without serious visual quality loss of reconstructed picture.
Most of fast block motion estimation algorithms reported so far in literatures aim to reduce the computation in terms of the number of search points, thus do not fit well with multimedia processors due to their irregular data flow. For multimedia processors, proper reuse of data is more important than reducing number of absolute difference operations because the execution cycle performance strongly depends on the number of off-chip memory access. Therefore, in this paper, we propose a sub-sampling predictive line search (SS-PLS) algorithm using line search pattern which can increase data reuse from on-chip local buffer, and check sub-sampling points in line search pattern to reduce unnecessary SAD operation. Our experimental results show that the prediction error (MAE) performance of the proposed SS-PLS is similar to that of the full search block matching algorithm (FSBMA), while compared with the hexagonal-based search (HEXBS), the SS-PLS outperforms. Also the proposed SS-PLS requires much lower off-chip memory access than the conventional fast motion estimation algorithm such as the hexagonal-based search (HEXBS) and the predictive line search (PLS). As a result, the proposed SS-PLS algorithm requires lower number of execution cycles in multimedia processor.
KEYWORDS: Scalable video coding, Binary data, Statistical modeling, Video, Signal to noise ratio, Video coding, Computer programming, Process modeling, Spatial resolution, Linear filtering
The standardization for the scalable extension of H.264 has called
for additional functionality based on H.264 standard to support the
combined spatio-temporal and SNR scalability. For the entropy coding
of H.264 scalable extension, Context-based Adaptive Binary
Arithmetic Coding (CABAC) scheme is considered so far. In this
paper, we present a new context modeling scheme by using inter layer
correlation between the syntax elements. As a result, it improves
coding efficiency of entropy coding in H.264 scalable extension. In
simulation results of applying the proposed scheme to encoding the
syntax element mb_type, it is shown that improvement in
coding efficiency of the proposed method is up to 16% in terms of
bit saving due to estimation of more adequate probability model.
The adaptive coding schemes in H.264 standard provide a significant coding efficiency and some additional features like error resilience and network friendliness. The variable block size motion compensation using multiple reference frames is one of the key H.264 coding elements to provide notable performance gain. However it is also the main culprit that increases the overall computational complexity. For this reason, this paper proposes a fast algorithm for variable block size motion estimation in H.264. In addition, we also propose a fast mode decision scheme by classifying modes based on rate-distortion cost. The experimental results show that the combined proposed methods provide significant improvement in processing speed without noticeable coding loss.
Recent advances in video coding technology have resulted in rapid growth of application in mobile communication. With this explosive growth, reliable transmission and error resilient technique become increasingly necessary to offer high quality multimedia service. This paper discusses the error resilient performances of the MPEG-4 simple profile under the H.324/M and the H.264 baseline under the IP packet networks. MPEG-4 simple profile has error resilient tools such as resynchronization marker insertion, data partitioning, and reversible VLC. H.264 baseline has the flexible macroblock ordering scheme, and others. The objective and subjective quality of decoded video is measured under various random bit and burst error conditions.
In this paper, we propose a RST-robust watermarking algorithm which exploits the orientation feature of a host image by using 2D Gabor kernels. From the viewpoint of watermark detection, host images are usually regarded as noise. However, since geometric manipulations affect watermarks as well as the host images simultaneously, evaluating host image can be helpful to measure the nature of distortion. To make most use of this property, we first hierarchically find the orientation of the host image with 2D Gabor kernels and insert a modified reference pattern aligned to the estimated orientation in a selected transformed domain. Since the pattern is generated in a repetitive manner according to the orientation, in its detection step, we can simply project the signal in the direction of image orientation and average the projected value to obtain a 1-D average pattern. Finally, correlation of the 1-D projection average pattern with watermark identifies periodic peaks. Analyzed are experimental results against geometric attacks including aspect ratio changes and rotation.
KEYWORDS: Digital watermarking, Error analysis, Video, Data hiding, Sensors, Computer programming, Visualization, Binary data, Video compression, Video coding
This paper proposes a method of error detection and recovery by hiding specific information into video bitstream using fragile watermarking and checking it later. The proposed method requires no additional bits into compressed bitstream since it embeds a user-specific data pattern in the least significant bits of LEVEL's in VLC codewords. The decoder can extract the information to check whether there is an error in the received bitstream. We also propose to use this method to embed essential data such as motion vectors that can be used for error recovery. The proposed method can detect corrupted MB's that usually escape the conventional syntax-based error detection scheme. This proposed method is quite simple and of low complexity.
Under the typical video communication configuration in which a camera is placed on top or at lateral side of a monitor, the face-to-face video communication has an inherent difficulty of poor eye contacts since the users stare at the monitor screen rather than directly seeing the camera lens. In this paper, we propose an image warping technique for gaze-correction which performs 3D warping of face object in the given image by a certain correction angle. The correction angle which is the angle between the direction of eye gaze and that to the camera is estimated in an unsupervised way by using eye tracking technique. Experimental results with real image data shows much enhanced naturalness which the face-to-face video communication has to offer.
A camera used in video communication over Internet is usually placed on top of a monitor, therefore it is hard for an user to make a natural eye contact with the peer communicator since the user gazes at the monitor, not the camera lens. In this paper, we propose a single 3D mesh warping technique for gaze-correction. It performs 3D rotation of face image by a certain correction angle to obtain a gaze-corrected image. The correction angle is estimated in an unsupervised way by using invariant face feature, and a very simple face section model is used in 3D rotation instead of precise, but not easily attainable in most cases, 3D face models. The method is computationally simple enough to implement for real-time casual video communication applications.
Several new approaches are being investigated in conjunction with the low bit rate coding, such as MPEG-4, to overcome the limitation imposed by block-based image compression. One solution is to use 'warping' prediction (or spatial transformation) based on a set of control points where one of the most important issues is how to adequately place the control points not destroying salient features such as edges and corners. In this paper, we propose a new image representation scheme based on irregular triangular mesh structure in which, considering the salient features, a considerably reduced number of control points are adaptively selected out of initial uniformly distributed control points. A new criterion based on local representation error is defined to be used in successive control point removal exploiting global image features, thus providing better image representation. Computer simulation has shown that the proposed scheme gives significantly improved image representation performance compared with the conventional scheme based on regular meshes in both objective and subjective qualities.
This paper proposes a novel blocking artifacts reduction method which is based on the notion that the blocking artifacts are present in images due to heavy accuracy loss of transform coefficients in the quantization process. We define the block boundary discontinuity measure as the sum of the squared differences of pixel values along the block boundary. The proposed method makes correction of the selected transform coefficients so that the resultant image has minimum block boundary discontinuity. It does not specify a transform domain where the correction should take place, therefore an appropriate transform domain can be selected at user's discretion. In the experiments, the scheme is applied to DCT- based compressed images to show its performance.
In many image sequence compression applications, Huffman coding is used to reduce statistical redundancy in quantized transform coefficients. The Huffman codeword table is often pre-defined to reduce coding delay and table transmission overhead. Local symbol statistics, however, may be much different from the global one manifested in the pre-defined table. In this paper, we propose a dynamic Huffman coding method which can adaptively modify the given codeword and symbol association according to the local statistics. Over a certain set of blocks, local symbol statistics is observed and used to re-associate the symbols to the codewords in such a way that shorter codewords are assigned to more frequency symbols. This modified code table is used to code the next set of blocks. A parameter is set up so that the relative degree of sensitivity of the local statistics to the global one can be controlled. By performing the same modification to the code table using the decoded symbols, it is possible to keep up with the code table changes in receiving side. The code table modification information need not be transmitted to the receiver. Therefore, there is no extra transmission overhead in employing this method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.