KEYWORDS: Performance modeling, Device simulation, Multimedia, Modeling, Systems modeling, Video, Chemical mechanical planarization, System on a chip, Resistors, Power supplies
We present an instruction-level power dissipation model of the Intel XScale microprocessor. The XScale implements the ARM ISA, but uses an aggressive microarchitecture and a SIMD Wireless MMX co-processor
to speed up execution of multimedia workloads in the embedded domain. Instruction-Level power modelling was first proposed by Tiwari et. al in 1994. Adaptations of this model have been found to be applicable to simple ARM processors. Research also shows that instructions can be clustered into groups with similar energy characteristics. We adapt these methodologies to the significantly more complex XScale processor. We characterize the processor in terms of the energy costs of opcode execution, operand values, pipeline stalls etc. through accurate measurements on hardware. This instruction-based (rather than microarchitectural) approach allows us to build a high-speed power-accurate simulator that runs at MIPS-range speeds, while
achieving accuracy better than 5%. The processor core accounts only for a portion of overall power consumption, and we move beyond the core to explore the issues involved in building a SystemC simulation framework that models power dissipation of complete systems quickly, flexibly and accurately.
KEYWORDS: Computer programming, Video, Motion estimation, Video processing, Image processing, Chemical mechanical planarization, Video coding, Digital video discs, Image segmentation, Computer architecture
In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.
The aim of this paper is to analyze the computational requirements of video watermarking algorithms running on PC-based systems and to study their implication for the design of general-purpose processors and systems. Selected watermarking algorithms are analyzed from a computational point of view. Application examples are executed on current general-purpose processor architecture to understand the computational requirements and to detect potential bottlenecks. In addition to this workload analysis, the potential exploitation of data level parallelism through the use of SIMD instructions available on current architectures is evaluated. Thread level parallelism schemes is also studied in current watermarking in order to understand the potential benefit of simultaneous multithreading processors and symmetric multiprocessor systems for such applications. Even if the study of the different watermarking algorithms is crucial to understand the requirements of a system, it is not sufficient. Indeed, watermarking schemes are very often only one kernel in a complete application and the interaction between the watermarking kernel and the rest of the application can highly influence the computational and memory bandwidth requirements of the system. Therefore the example of watermarking detection in a video decoder is used to understand the additional system implications due to the merging of video decoding and watermarking algorithms.
KEYWORDS: Image processing, Multimedia, Computer programming, Digital signal processing, Signal processing, Video processing, Video, Video coding, Motion estimation, Data processing
This paper proposes a classification of the parallelisms in general-purpose processor based systems in three main categories. One category is the intra-processor parallelism that includes multimedia instructions and superscalar and VLIW architectures. The former takes advantage of data parallelism. The latter benefit from instruction level parallelism. Another category is the inter-processor parallelism. We consider the parallelism between processors inside shared memory symmetric multiprocessor systems and in distributed memory clusters of workstations. Finally, in the last category, main features of the system level parallelism are studied including the input/output operations, the memory hierarchy and the exploitation of external processing. The potential gain is studied for each type of parallelism available in general-purpose processor based systems from a theoretical point of view as well as for existing image and video applications. The results in this paper showed that the exploitation of the different levels of parallelism available in PC workstations can lead to considerable gains in speed when optimizing a multimedia application. Finally the results of this work can be used to influence the design of new multimedia systems and media processors.
Digital data representation provides an efficient and fast way to access to information and to exchange it. In many situations though ownership or copyright protection mechanisms are desired. For still images and video, one possible way to achieve this is through watermarking. Watermarking consists of an imperceptible information embedded within a given media. Parallel Processing Watermarking Embedding Schemes have demonstrated to be efficient from a computational and memory usage point of view for very large images. These schemes consist in dividing the image into tiles and watermarking each independently. The processing allows the use of a parallel computation scheme. The watermarking method used in the scope of this work is a parallel variant of an approach known as self-referenced Spread Spectrum signature pattern. Since the watermarking scheme has been modified through tiling, the extra references due to signature replication can be used in the retrieval. This work describes the above mentioned approach to watermark images and provides an analysis of its performance.
KEYWORDS: Digital watermarking, Image processing, Parallel processing, Signal processing, Signal detection, Information security, Data communications, Algorithm development, Visualization, Image resolution
Large and high-resolution images usually have a high commercial value. Thus they are very good candidates for watermarking. If many images have to be signed in a Client-Server setup, memory and computational requirements could become unrealistic for current and near future solutions. In this paper, we propose to tile the image into sub-images. The watermarking scheme is then applied to each sub-image in the embedding and retrieval process. Thanks to this solution, the first possible optimization consists in creating different threads to read and write the image tile by tile. The time spent in input/output operations, which can be a bottleneck for large images, is reduced. In addition to this optimization, we show that the memory consumption of the application is also highly reduced for large images. Finally, the application can be multithreaded so that different tiles can be watermarked in parallel. Therefore the scheme can take advantage of the processing power of the different processors available in current servers. We show that the correct tile size and the right amount of threads have to be created to efficiently distribute the workload. Eventually, security, robustness and invisibility issues are addressed considering the signal redundancy.
Motion estimation represents the main computational burden of every hybrid video encoder. Various solutions have been proposed in order to reduce the number of operations needed for this task, trying to keep good quality of the estimation and of the relative encoded video. In this paper we propose an algorithm that, exploiting the statistical properties of the motion field, searches a number of points dynamically related to the evolution of the sequence. A subsampling pattern of the macroblock is also proposed to reduce the overall impact of the motion estimation in an MPEG encoder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.