SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Sjöström Mårten 1967 ) "

Sökning: WFRF:(Sjöström Mårten 1967 )

  • Resultat 1-50 av 80
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahmad, Waqas, et al. (författare)
  • Compression scheme for sparsely sampled light field data based on pseudo multi-view sequences
  • 2018
  • Ingår i: OPTICS, PHOTONICS, AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS V Proceedings of SPIE - The International Society for Optical Engineering. - : SPIE - International Society for Optical Engineering.
  • Konferensbidrag (refereegranskat)abstract
    • With the advent of light field acquisition technologies, the captured information of the scene is enriched by having both angular and spatial information. The captured information provides additional capabilities in the post processing stage, e.g. refocusing, 3D scene reconstruction, synthetic aperture etc. Light field capturing devices are classified in two categories. In the first category, a single plenoptic camera is used to capture a densely sampled light field, and in second category, multiple traditional cameras are used to capture a sparsely sampled light field. In both cases, the size of captured data increases with the additional angular information. The recent call for proposal related to compression of light field data by JPEG, also called “JPEG Pleno”, reflects the need of a new and efficient light field compression solution. In this paper, we propose a compression solution for sparsely sampled light field data. In a multi-camera system, each view depicts the scene from a single perspective. We propose to interpret each single view as a frame of pseudo video sequence. In this way, complete MxN views of multi-camera system are treated as M pseudo video sequences, where each pseudo video sequence contains N frames. The central pseudo video sequence is taken as base View and first frame in all the pseudo video sequences is taken as base Picture Order Count (POC). The frame contained in base view and base POC is labeled as base frame. The remaining frames are divided into three predictor levels. Frames placed in each successive level can take prediction from previously encoded frames. However, the frames assigned with last prediction level are not used for prediction of other frames. Moreover, the rate-allocation for each frame is performed by taking into account its predictor level, its frame distance and view wise decoding distance relative to the base frame. The multi-view extension of high efficiency video coding (MV-HEVC) is used to compress the pseudo multi-view sequences. The MV-HEVC compression standard enables the frames to take prediction in both direction (horizontal and vertical d), and MV-HEVC parameters are used to implement the proposed 2D prediction and rate allocation scheme. A subset of four light field images from Stanford dataset are compressed, using the proposed compression scheme on four bitrates in order to cover the low to high bit-rates scenarios. The comparison is made with state-of-art reference encoder HEVC and its real-time implementation X265. The 17x17 grid is converted into a single pseudo sequence of 289 frames by following the order explained in JPEG Pleno call for proposal and given as input to the both reference schemes. The rate distortion analysis shows that the proposed compression scheme outperforms both reference schemes in all tested bitrate scenarios for all test images. The average BD-PSNR gain is 1.36 dB over HEVC and 2.15 dB over X265.
  •  
2.
  • Ahmad, Waqas, et al. (författare)
  • Computationally Efficient Light Field Image Compression Using a Multiview HEVC Framework
  • 2019
  • Ingår i: IEEE Access. - 2169-3536. ; 7, s. 143002-143014
  • Tidskriftsartikel (refereegranskat)abstract
    • The acquisition of the spatial and angular information of a scene using light eld (LF) technologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth. The additional angular information possessed by LF data increases the size of the overall data captured while offering the same spatial resolution. The main contributor to the size of captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-art video encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a single PVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in the LF data. In this paper, we present an LF compression framework that efciently exploits the spatial and angular correlation using a multiview extension of high-efciency video coding (MV-HEVC). The input LF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation scheme takes into account the assigned organization of frames and distributes quality/bits among them accordingly. Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assigned quality. The proposed compression scheme is evaluated by following the common test conditions set by JPEG Pleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and 2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motionsearch scheme is proposed in the framework that reduces the computational complexity (in terms of the sum of absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss in visual quality (approximately 0.05 dB).
  •  
3.
  • Ahmad, Waqas (författare)
  • High Efficiency Light Field Image Compression : Hierarchical Bit Allocation and Shearlet-based View Interpolation
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Over the years, the pursuit of capturing the precise visual information of a scenehas resulted in various enhancements in digital camera technology, such as highdynamic range, extended depth of field, and high resolution. However, traditionaldigital cameras only capture the spatial information of the scene and cannot pro-vide an immersive presentation of it. Light field (LF) capturing is a new-generationimaging technology that records the spatial and angular information of the scene. Inrecent years, LF imaging has become increasingly popular among the industry andresearch community mainly for two reasons: (1) the advancements made in optical and computational technology have facilitated the process of capturing and processing LF information and (2) LF data have the potential to offer various post-processing applications, such as refocusing at different depth planes, synthetic aperture, 3Dscene reconstruction, and novel view generation. Generally, LF-capturing devicesacquire large amounts of data, which poses a challenge for storage and transmissionresources. Off-the-shelf image and video compression schemes, built on assump-tions drawn from natural images and video, tend to exploit spatial and temporalcorrelations. However, 4D LF data inherit different properties, and hence there is aneed to advance the current compression methods to efficiently address the correla-tion present in LF data.In this thesis, compression of LF data captured using a plenoptic camera andmulti-camera system (MCS) is considered. Perspective views of a scene capturedfrom different positions are interpreted as a frame of multiple pseudo-video se-quences and given as an input to a multi-view extension of high-efficiency videocoding (MV-HEVC). A 2D prediction and hierarchical coding scheme is proposedin MV-HEVC to improve the compression efficiency of LF data. To further increasethe compression efficiency of views captured using an MCS, an LF reconstructionscheme based on shearlet transform is introduced in LF compression. A sparse set of views is coded using MV-HEVC and later used to predict the remaining views by applying shearlet transform. The prediction error is also coded to further increase the compression efficiency. Publicly available LF datasets are used to benchmark the proposed compression schemes. The anchor scheme specified in the JPEG Plenocommon test conditions is used to evaluate the performance of the proposed scheme. Objective evaluations show that the proposed scheme outperforms state-of-the-art schemes in the compression of LF data captured using a plenoptic camera and an MCS. Moreover, the introduction of shearlet transform in LF compression further improves the compression efficiency at low bitrates, at which the human vision sys-tem is sensitive to the perceived quality.The work presented in this thesis has been published in four peer-reviewed con-ference proceedings and two scientific journals. The proposed compression solu-tions outlined in this thesis significantly improve the rate-distortion efficiency forLF content, which reduces the transmission and storage resources. The MV-HEVC-based LF coding scheme is made publicly available, which can help researchers totest novel compression tools and it can serve as an anchor scheme for future researchstudies. The shearlet-transform-based LF compression scheme presents a compre-hensive framework for testing LF reconstruction methods in the context of LF com-pression.
  •  
4.
  •  
5.
  • Ahmad, Waqas, et al. (författare)
  • Interpreting Plenoptic Images as Multi-View Sequences for Improved Compression
  • 2017
  • Ingår i: ICIP 2017. - : IEEE. - 9781509021758 ; , s. 4557-4561
  • Konferensbidrag (refereegranskat)abstract
    • Over the last decade, advancements in optical devices have made it possible for new novel image acquisition technologies to appear. Angular information for each spatial point is acquired in addition to the spatial information of the scene that enables 3D scene reconstruction and various post-processing effects. Current generation of plenoptic cameras spatially multiplex the angular information, which implies an increase in image resolution to retain the level of spatial information gathered by conventional cameras. In this work, the resulting plenoptic image is interpreted as a multi-view sequence that is efficiently compressed using the multi-view extension of high efficiency video coding (MV-HEVC). A novel two dimensional weighted prediction and rate allocation scheme is proposed to adopt the HEVC compression structure to the plenoptic image properties. The proposed coding approach is a response to ICIP 2017 Grand Challenge: Light field Image Coding. The proposed scheme outperforms all ICME contestants, and improves on the JPEG-anchor of ICME with an average PSNR gain of 7.5 dB and the HEVC-anchor of ICIP 2017 Grand Challenge with an average PSNR gain of 2.4 dB.
  •  
6.
  • Ahmad, Waqas, et al. (författare)
  • Matching Light Field Datasets From Plenoptic Cameras 1.0 And 2.0
  • 2018
  • Ingår i: Proceedings of the 2018 3DTV Conference. - 9781538661253
  • Konferensbidrag (refereegranskat)abstract
    • The capturing of angular and spatial information of the scene using single camera is made possible by new emerging technology referred to as plenoptic camera. Both angular and spatial information, enable various post-processing applications, e.g. refocusing, synthetic aperture, super-resolution, and 3D scene reconstruction. In the past, multiple traditional cameras were used to capture the angular and spatial information of the scene. However, recently with the advancement in optical technology, plenoptic cameras have been introduced to capture the scene information. In a plenoptic camera, a lenslet array is placed between the main lens and the image sensor that allows multiplexing of the spatial and angular information onto a single image, also referred to as plenoptic image. The placement of the lenslet array relative to the main lens and the image sensor, results in two different optical design sof a plenoptic camera, also referred to as plenoptic 1.0 and plenoptic 2.0. In this work, we present a novel dataset captured with plenoptic 1.0 (Lytro Illum) and plenoptic 2.0(Raytrix R29) cameras for the same scenes under the same conditions. The dataset provides the benchmark contents for various research and development activities for plenoptic images.
  •  
7.
  • Ahmad, Waqas, et al. (författare)
  • Shearlet Transform-Based Light Field Compression under Low Bitrates
  • 2020
  • Ingår i: IEEE Transactions on Image Processing. - : IEEE. - 1057-7149 .- 1941-0042. ; 29, s. 4269-4280
  • Tidskriftsartikel (refereegranskat)abstract
    • Light field (LF) acquisition devices capture spatial and angular information of a scene. In contrast with traditional cameras, the additional angular information enables novel post-processing applications, such as 3D scene reconstruction, the ability to refocus at different depth planes, and synthetic aperture. In this paper, we present a novel compression scheme for LF data captured using multiple traditional cameras. The input LF views were divided into two groups: key views and decimated views. The key views were compressed using the multi-view extension of high-efficiency video coding (MV-HEVC) scheme, and decimated views were predicted using the shearlet-transform-based prediction (STBP) scheme. Additionally, the residual information of predicted views was also encoded and sent along with the coded stream of key views. The proposed scheme was evaluated over a benchmark multi-camera based LF datasets, demonstrating that incorporating the residual information into the compression scheme increased the overall peak signal to noise ratio (PSNR) by 2 dB. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. 
  •  
8.
  • Ahmad, Waqas, et al. (författare)
  • Shearlet Transform Based Prediction Scheme for Light Field Compression
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • Light field acquisition technologies capture angular and spatial information ofthe scene. The spatial and angular information enables various post processingapplications, e.g. 3D scene reconstruction, refocusing, synthetic aperture etc at theexpense of an increased data size. In this paper, we present a novel prediction tool forcompression of light field data acquired with multiple camera system. The captured lightfield (LF) can be described using two plane parametrization as, L(u, v, s, t), where (u, v)represents each view image plane coordinates and (s, t) represents the coordinates of thecapturing plane. In the proposed scheme, the captured LF is uniformly decimated by afactor d in both directions (in s and t coordinates), resulting in a sparse set of views alsoreferred to as key views. The key views are converted into a pseudo video sequence andcompressed using high efficiency video coding (HEVC). The shearlet transform basedreconstruction approach, presented in [1], is used at the decoder side to predict thedecimated views with the help of the key views.Four LF images (Truck, Bunny from Stanford dataset, Set2 and Set9 from High DensityCamera Array dataset) are used in the experiments. Input LF views are converted into apseudo video sequence and compressed with HEVC to serve as anchor. Rate distortionanalysis shows the average PSNR gain of 0.98 dB over the anchor scheme. Moreover, inlow bit-rates, the compression efficiency of the proposed scheme is higher compared tothe anchor and on the other hand the performance of the anchor is better in high bit-rates.Different compression response of the proposed and anchor scheme is a consequence oftheir utilization of input information. In the high bit-rate scenario, high quality residualinformation enables the anchor to achieve efficient compression. On the contrary, theshearlet transform relies on key views to predict the decimated views withoutincorporating residual information. Hence, it has inherit reconstruction error. In the lowbit-rate scenario, the bit budget of the proposed compression scheme allows the encoderto achieve high quality for the key views. The HEVC anchor scheme distributes the samebit budget among all the input LF views that results in degradation of the overall visualquality. The sensitivity of human vision system toward compression artifacts in low-bitratecases favours the proposed compression scheme over the anchor scheme.
  •  
9.
  •  
10.
  • Ahmad, Waqas, et al. (författare)
  • The Plenoptic Dataset
  • 2018
  • Annan publikationabstract
    • The dataset is captured using two different plenoptic cameras, namely Illum from Lytro (based on plenoptic 1.0 model) and R29 from Raytrix (based on plenoptic 2.0 model). The scenes selected for the dataset were captured under controlled conditions. The cameras were mounted onto a multi-camera rig that was mechanically controlled to move the cameras with millimeter precision. In this way, both cameras captured the scene from the same viewpoint.
  •  
11.
  • Ahmad, Waqas, et al. (författare)
  • Towards a generic compression solution for densely and sparsely sampled light field data
  • 2018
  • Ingår i: Proceedings of 25TH IEEE International Conference On Image Processing. - 9781479970612 ; , s. 654-658
  • Konferensbidrag (refereegranskat)abstract
    • Light field (LF) acquisition technologies capture the spatial and angular information present in scenes. The angular information paves the way for various post-processing applications such as scene reconstruction, refocusing, and synthetic aperture. The light field is usually captured by a single plenoptic camera or by multiple traditional cameras. The former captures a dense LF, while the latter captures a sparse LF. This paper presents a generic compression scheme that efficiently compresses both densely and sparsely sampled LFs. A plenoptic image is converted into sub-aperture images, and each sub-aperture image is interpreted as a frame of a multiview sequence. In comparison, each view of the multi-camera system is treated as a frame of a multi-view sequence. The multi-view extension of high efficiency video coding (MVHEVC) is used to encode the pseudo multi-view sequence.This paper proposes an adaptive prediction and rate allocation scheme that efficiently compresses LF data irrespective of the acquisition technology used.
  •  
12.
  •  
13.
  • Boström, Lena, 1960-, et al. (författare)
  • Digital visualisering i skolan : Mittuniversitetets slutrapport från förstudien
  • 2018
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Den här studiens syfte har varit tvåfaldigt, nämligen att testa alternativa lärmetoder via ett digitalt läromedel i matematik i en kvasiexperimentell studie samt att tillämpa metoder av användarupplevelser för interaktiva visualiseringar, och därigenom öka kunskapen kring hur upplevd kvalitet beror på använd teknik. Pilotstudien sätter också fokus på flera angelägna områden inom skolutveckling både regionalt och nationellt samt viktiga aspekter när det gäller kopplingen teknik, pedagogik och utvärderingsmetoder inom “den tekniska delen”. Det förra handlar om sjunkande matematikresultat i skolan, praktiknära skolforskning, stärkt digital kompetens, visualisering och lärande samt forskning om visualisering och utvärdering. Den senare svarar på frågor om vilka tekniska lösningar som tidigare använts och med vilket syfte har de skapats samt hur visualiseringar har utvärderats enligt läroböcker och i forskningslitteratur. När det gäller elevernas resultat, en av de stora forskningsfrågorna i studien, så fann vi inga signifikanta skillnader mellan traditionell undervisning och undervisning med visualiseringsläromedlet (3D). Beträffande elevers attityder till matematikmomentet kan konstateras att i kontrollgruppen för årskurs 6 förbättrades attityden signifikans, men inte i klass 8. Gällande flickors och pojkars resultat och attityder kan vi konstatera att flickorna i båda klasserna hade bättre förkunskaper än pojkarna samt att i årskurs 6 var flickorna mer positiva till matematikmomentet än pojkarna i kontrollgruppen. Därutöver kan vi inte skönja några signifikanta skillnader. Andra viktiga rön i studien var att provkonstruktionen inte var optimal samt att tiden för provgenomförande har stor betydelse när på dagen det genomfördes. Andra resultat resultaten i den kvalitativa analysen pekar på positiva attityder och beteenden från eleverna vid arbetet med det visuella läromedlet. Elevernas samarbete och kommunikation förbättrades under lektionerna. Vidare pekade lärarna på att med 3D-läromedlet gavs större möjligheter till att stimulera flera sinnen under lärprocessen. En tydlig slutsats är att 3D-läromedlet är ett viktigt komplement i undervisningen, men kan inte användas helt självt. Vi kan varken sälla oss till de forskare som anser att 3D-visualisering är överlägset som läromedel för elevers resultat eller till de forskare som varnar för dess effekter för elevers kognitiva överbelastning.  Våra resultat ligger mer i linje med de slutsatser Skolforskningsinstitutet (2017) drar, nämligen att undervisning med digitala läromedel i matematik kan ha positiva effekter, men en lika effektiv undervisning kan möjligen designas på andra sätt. Däremot pekar resultaten i vår studie på ett flertal störningsmoment som kan ha påverkat möjliga resultat och behovet av god teknologin och välutvecklade programvaror. I studien har vi analyserat resultaten med hjälp av två övergripande ramverk för integrering av teknikstöd i lärande, SAMR och TPACK. Det förra ramverket bidrog med en taxonomi vid diskussionen av hur väl teknikens möjligheter tagits tillvara av läromedel och i läraktiviteter, det senare för en diskussion om de didaktiska frågeställningarna med fokus på teknikens roll. Båda aspekterna är högaktuella med tanke på den ökande digitaliseringen i skolan. Utifrån tidigare forskning och denna pilotstudie förstår vi att det är viktigt att designa forskningsmetoderna noggrant. En randomisering av grupper vore önskvärt. Prestandamått kan också vara svåra att välja. Tester där personer får utvärdera användbarhet (usability) och användarupplevelse (user experience, UX) baserade på både kvalitativa och kvantitativa metoder blir viktiga för själva användandet av tekniken, men det måste till ytterligare utvärderingar för att koppla tekniken och visualiseringen till kvaliteten i lärandet och undervisningen. Flera metoder behövs således och det blir viktigt med samarbete mellan olika ämnen och discipliner.
  •  
14.
  • Boström, Lena, 1960-, et al. (författare)
  • MethodViz : designing and evaluating an interactive learning tool for scientific methods – visual learning support and visualization of research process structure
  • 2022
  • Ingår i: Education and Information Technologies. - : Springer Science and Business Media LLC. - 1360-2357 .- 1573-7608. ; 27:9, s. 12793-12810
  • Tidskriftsartikel (refereegranskat)abstract
    • In this study, we focussed on designing and evaluating a learning tool for the research process in higher education. Mastering the research process seems to be a bottleneck within the academy. Therefore, there is a great need to offer students other ways to learn this skill in addition to books and lectures. The MethodViz tool supports ubiquitous aspects of the research process in their scientific works higher education students follow. Moreover, the tool facilitates and structures the process interactively. In this paper, we describe the creation process of the artefact and examine the characteristics and scope of MethodViz alongside the traits and ideas of design science research. The evaluation’s results are encouraging and show that MethodViz has the potential to improve students’ learning achievements.
  •  
15.
  • Brunnström, Kjell, et al. (författare)
  • 2D no-reference video quality model development and 3D video transmission quality
  • 2012
  • Ingår i: Proceedings of the Sixth International Workshop on Video Processing and Quality Metrics for Consumer Electronics VPQM-2012.
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • This presentation will target two different topics in video quality assessment. First, we discuss 2D no-reference video quality model development. Further, we discuss how to find suitable quality for 3D video transmission. No-reference metrics are the only practical option for monitoring of 2D video quality in live networks. In order to decrease the development time, it might be possible to use full-reference metrics for this purpose. In this work, we have evaluated six full-reference objective metrics in three different databases. We show statistically that VQM performs the best. Further, we use these results to develop a lightweight no-reference model. We have also investigated users' experience of stereoscopic 3D video quality by performing the rating of two subjective assessment datasets, targeting in one dataset efficient transmission in the transmission error free case and error concealment in the other. Among other results, it was shown that, based on the same level of quality of experience, spatial down-sampling may lead to better bitrate efficiency while temporal down-sampling will be worse. When network impairments occur, traditional error 2D concealment methods need to be reinvestigated as they were outperformed switching to 2D presentation.
  •  
16.
  • Brunnström, Kjell, 1960-, et al. (författare)
  • Latency impact on Quality of Experience in a virtual reality simulator for remote control of machines
  • 2020
  • Ingår i: Signal processing. Image communication. - : Elsevier. - 0923-5965 .- 1879-2677. ; 89
  • Tidskriftsartikel (refereegranskat)abstract
    • In this article, we have investigated a VR simulator of a forestry crane used for loading logs onto a truck. We have mainly studied the Quality of Experience (QoE) aspects that may be relevant for task completion, and whether there are any discomfort related symptoms experienced during the task execution. QoE experiments were designed to capture the general subjective experience of using the simulator, and to study task performance. The focus was to study the effects of latency on the subjective experience, with regards to delays in the crane control interface. Subjective studies were performed with controlled delays added to the display update and hand controller (joystick) signals. The added delays ranged from 0 to 30 ms for the display update, and from 0 to 800 ms for the hand controller. We found a strong effect on latency in the display update and a significant negative effect for 800 ms added delay on latency in the hand controller (in total approx. 880 ms latency including the system delay). The Simulator Sickness Questionnaire (SSQ) gave significantly higher scores after the experiment compared to before the experiment, but a majority of the participants reported experiencing only minor symptoms. Some test subjects ceased the test before finishing due to their symptoms, particularly due to the added latency in the display update.
  •  
17.
  • Brunnström, Kjell, 1960-, et al. (författare)
  • Quality of Experience for a Virtual Reality simulator
  • 2018
  • Ingår i: Human Vision and Electronic Imaging 2018. - : The Society for Imaging Science and Technology.
  • Konferensbidrag (refereegranskat)abstract
    • In this study, we investigate a VR simulator of a forestry crane used for loading logs onto a truck, mainly looking at Quality of Experience (QoE) aspects that may be relevant for task completion, but also whether there are any discomfort related symptoms experienced during task execution. The QoE test has been designed to capture both the general subjective experience of using the simulator and to study task completion rate. Moreover, a specific focus has been to study the effects of latency on the subjective experience, with regards both to delays in the crane control interface as well as lag in the visual scene rendering in the head mounted display (HMD). Two larger formal subjective studies have been performed: one with the VR-system as it is and one where we have added controlled delay to the display update and to the joystick signals. The baseline study shows that most people are more or less happy with the VR-system and that it does not have strong effects on any symptoms as listed in the Simulator Sickness Questionnaire (SSQ). In the delay study we found significant effects on Comfort Quality and Immersion Quality for higher Display delay (30 ms), but very small impact of joystick delay. Furthermore, the Display delay had strong influence on the symptoms in the SSQ, and causing test subjects to decide not to continue with the complete experiments. We found that this was especially connected to the longer added Display delays (≥ 20 ms).
  •  
18.
  • Brunnström, Kjell, 1960-, et al. (författare)
  • Quality of experience of hand controller latency in a virtual reality simulator
  • 2019
  • Ingår i: IS and T International Symposium on Electronic Imaging Science and Technology. - Springfield, VA, United States : Society for Imaging Science and Technology.
  • Konferensbidrag (refereegranskat)abstract
    • In this study, we investigate a VR simulator of a forestry crane used for loading logs onto a truck, mainly looking at Quality of Experience (QoE) aspects that may be relevant for task completion, but also whether there are any discomfort related symptoms experienced during task execution. A QoE test has been designed to capture both the general subjective experience of using the simulator and to study task performance. Moreover, a specific focus has been to study the effects of latency on the subjective experience, with regards to delays in the crane control interface. A formal subjective study has been performed where we have added controlled delays to the hand controller (joystick) signals. The added delays ranged from 0 ms to 800 ms. We found no significant effects of delays on the task performance on any scales up to 200 ms. A significant negative effect was found for 800 ms added delay. The Symptoms reported in the Simulator Sickness Questionnaire (SSQ) was significantly higher for all the symptom groups, but a majority of the participants reported only slight symptoms. Two out of thirty test persons stopped the test before finishing due to their symptoms. 
  •  
19.
  • Conti, Caroline, et al. (författare)
  • Light Field Image Compression
  • 2018
  • Ingår i: 3D Visual Content Creation, Coding and Delivery. - Cham : Springer. - 9783319778426 ; , s. 143-176
  • Bokkapitel (refereegranskat)
  •  
20.
  • Damghanian, Mitra, et al. (författare)
  • Depth and Angular Resolution in Plenoptic Cameras
  • 2015
  • Ingår i: 2015 IEEE International Conference On Image Processing (ICIP), September 2015. - : IEEE. ; , s. 3044-3048
  • Konferensbidrag (refereegranskat)abstract
    • We present a model-based approach to extract the depth and angular resolution in a plenoptic camera. Obtained results for the depth and angular resolution are validated against Zemax ray tracing results. The provided model-based approach gives the location and number of the resolvable depth planes in a plenoptic camera as well as the angular resolution with regards to disparity in pixels. The provided model-based approach is straightforward compared to practical measurements and can reflect on the plenoptic camera parameters such as the microlens f-number in contrast with the principal-ray-model approach. Easy and accurate quantification of different resolution terms forms the basis for designing the capturing setup and choosing a reasonable system configuration for plenoptic cameras. Results from this work will accelerate customization of the plenoptic cameras for particular applications without the need for expensive measurements.
  •  
21.
  • Damghanian, Mitra, 1978-, et al. (författare)
  • Investigating the lateral resolution in a plenoptic capturing system using the SPC model
  • 2013
  • Ingår i: Proceedings of SPIE - The International Society for Optical Engineering. - : SPIE - International Society for Optical Engineering. - 9780819494337 ; , s. 86600T-
  • Konferensbidrag (refereegranskat)abstract
    • Complex multidimensional capturing setups such as plenoptic cameras (PC) introduce a trade-off between various system properties. Consequently, established capturing properties, like image resolution, need to be described thoroughly for these systems. Therefore models and metrics that assist exploring and formulating this trade-off are highly beneficial for studying as well as designing of complex capturing systems. This work demonstrates the capability of our previously proposed sampling pattern cube (SPC) model to extract the lateral resolution for plenoptic capturing systems. The SPC carries both ray information as well as focal properties of the capturing system it models. The proposed operator extracts the lateral resolution from the SPC model throughout an arbitrary number of depth planes giving a depth-resolution profile. This operator utilizes focal properties of the capturing system as well as the geometrical distribution of the light containers which are the elements in the SPC model. We have validated the lateral resolution operator for different capturing setups by comparing the results with those from Monte Carlo numerical simulations based on the wave optics model. The lateral resolution predicted by the SPC model agrees with the results from the more complex wave optics model better than both the ray based model and our previously proposed lateral resolution operator. This agreement strengthens the conclusion that the SPC fills the gap between ray-based models and the real system performance, by including the focal information of the system as a model parameter. The SPC is proven a simple yet efficient model for extracting the lateral resolution as a high-level property of complex plenoptic capturing systems.
  •  
22.
  • Damghanian, Mitra, 1978-, et al. (författare)
  • Performance analysis in Lytro camera: Empirical and model based approaches to assess refocusing quality
  • 2014
  • Ingår i: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. - : IEEE conference proceedings. ; , s. 559-563
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we investigate the performance of Lytro camera in terms of its refocusing quality. The refocusing quality of the camera is related to the spatial resolution and the depth of field as the contributing parameters. We quantify the spatial resolution profile as a function of depth using empirical and model based approaches. The depth of field is then determined by thresholding the spatial resolution profile. In the model based approach, the previously proposed sampling pattern cube (SPC) model for representation and evaluation of the plenoptic capturing systems is utilized. For the experimental resolution measurements, camera evaluation results are extracted from images rendered by the Lytro full reconstruction rendering method. Results from both the empirical and model based approaches assess the refocusing quality of the Lytro camera consistently, highlighting the usability of the model based approaches for performance analysis of complex capturing systems.
  •  
23.
  • Dima, Elijs, 1990- (författare)
  • Augmented Telepresence based on Multi-Camera Systems : Capture, Transmission, Rendering, and User Experience
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    •  Observation and understanding of the world through digital sensors is an ever-increasing part of modern life. Systems of multiple sensors acting together have far-reaching applications in automation, entertainment, surveillance, remote machine control, and robotic self-navigation. Recent developments in digital camera, range sensor and immersive display technologies enable the combination of augmented reality and telepresence into Augmented Telepresence, which promises to enable more effective and immersive forms of interaction with remote environments.The purpose of this work is to gain a more comprehensive understanding of how multi-sensor systems lead to Augmented Telepresence, and how Augmented Telepresence can be utilized for industry-related applications. On the one hand, the conducted research is focused on the technological aspects of multi-camera capture, rendering, and end-to-end systems that enable Augmented Telepresence. On the other hand, the research also considers the user experience aspects of Augmented Telepresence, to obtain a more comprehensive perspective on the application and design of Augmented Telepresence solutions.This work addresses multi-sensor system design for Augmented Telepresence regarding four specific aspects ranging from sensor setup for effective capture to the rendering of outputs for Augmented Telepresence. More specifically, the following problems are investigated: 1) whether multi-camera calibration methods can reliably estimate the true camera parameters; 2) what the consequences are of synchronization errors in a multi-camera system; 3) how to design a scalable multi-camera system for low-latency, real-time applications; and 4) how to enable Augmented Telepresence from multi-sensor systems for mining, without prior data capture or conditioning. The first problem was solved by conducting a comparative assessment of widely available multi-camera calibration methods. A special dataset was recorded, enforcing known constraints on camera ground-truth parameters to use as a reference for calibration estimates. The second problem was addressed by introducing a depth uncertainty model that links the pinhole camera model and synchronization error to the geometric error in the 3D projections of recorded data. The third problem was addressed empirically - by constructing a multi-camera system based on off-the-shelf hardware and a modular software framework. The fourth problem was addressed by proposing a processing pipeline of an augmented remote operation system for augmented and novel view rendering.The calibration assessment revealed that target-based and certain target-less calibration methods are relatively similar in their estimations of the true camera parameters, with one specific exception. For high-accuracy scenarios, even commonly used target-based calibration approaches are not sufficiently accurate with respect to the ground truth. The proposed depth uncertainty model was used to show that converged multi-camera arrays are less sensitive to synchronization errors. The mean depth uncertainty of a camera system correlates to the rendered result in depth-based reprojection as long as the camera calibration matrices are accurate. The presented multi-camera system demonstrates a flexible, de-centralized framework where data processing is possible in the camera, in the cloud, and on the data consumer's side. The multi-camera system is able to act as a capture testbed and as a component in end-to-end communication systems, because of the general-purpose computing and network connectivity support coupled with a segmented software framework. This system forms the foundation for the augmented remote operation system, which demonstrates the feasibility of real-time view generation by employing on-the-fly lidar de-noising and sparse depth upscaling for novel and augmented view synthesis.In addition to the aforementioned technical investigations, this work also addresses the user experience impacts of Augmented Telepresence. The following two questions were investigated: 1) What is the impact of camera-based viewing position in Augmented Telepresence? 2) What is the impact of depth-aiding augmentations in Augmented Telepresence? Both are addressed through a quality of experience study with non-expert participants, using a custom Augmented Telepresence test system for a task-based experiment. The experiment design combines in-view augmentation, camera view selection, and stereoscopic augmented scene presentation via a head-mounted display to investigate both the independent factors and their joint interaction.The results indicate that between the two factors, view position has a stronger influence on user experience. Task performance and quality of experience were significantly decreased by viewing positions that force users to rely on stereoscopic depth perception. However, position-assisting view augmentations can mitigate the negative effect of sub-optimal viewing positions; the extent of such mitigation is subject to the augmentation design and appearance.In aggregate, the works presented in this dissertation cover a broad view of Augmented Telepresence. The individual solutions contribute general insights into Augmented Telepresence system design, complement gaps in the current discourse of specific areas, and provide tools for solving challenges found in enabling the capture, processing, and rendering in real-time-oriented end-to-end systems.
  •  
24.
  • Dima, Elijs, 1990-, et al. (författare)
  • Camera and Lidar-based View Generation for Augmented Remote Operation in Mining Applications
  • 2021
  • Ingår i: IEEE Access. - : IEEE. - 2169-3536. ; 9, s. 82199-82212
  • Tidskriftsartikel (refereegranskat)abstract
    • Remote operation of diggers, scalers, and other tunnel-boring machines has significant benefits for worker safety in underground mining. Real-time augmentation of the presented remote views can further improve the operator effectiveness through a more complete presentation of relevant sections of the remote location. In safety-critical applications, such augmentation cannot depend on preconditioned data, nor generate plausible-looking yet inaccurate sections of the view. In this paper, we present a capture and rendering pipeline for real time view augmentation and novel view synthesis that depends only on the inbound data from lidar and camera sensors. We suggest an on-the-fly lidar filtering for reducing point oscillation at no performance cost, and a full rendering process based on lidar depth upscaling and in-view occluder removal from the presented scene. Performance assessments show that the proposed solution is feasible for real-time applications, where per-frame processing fits within the constraints set by the inbound sensor data and within framerate tolerances for enabling effective remote operation.
  •  
25.
  • Dima, Elijs, et al. (författare)
  • Estimation and Post-Capture Compensation of Synchronization Error in Unsynchronized Multi-Camera Systems
  • 2021
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Multi-camera systems are used in entertainment production, computer vision, industry and surveillance. The benefit of using multi-camera systems is the ability to recover the 3D structure, or depth, of the recorded scene. However, various types of cameras, including depth cameras, can not be reliably synchronized during recording, which leads to errors in depth estimation and scene rendering. The aim of this work is to propose a method for compensating synchronization errors in already recorded sequences, without changing the format of the recorded sequences. We describe a depth uncertainty model for parametrizing the impact of synchronization errors in a multi-camera system, and propose a method for synchronization error estimation and compensation. The proposed method is based on interpolating an image at a desired timeframe based on adjacent non-synchronized images in a single camera's sequence, using an array of per-pixel distortion vectors. This array is generated by using the difference between adjacent images to locate and segment the recorded moving objects, and does not require any object texture or distinguishing features beyond the observed difference in adjacent images. The proposed compensation method is compared with optical-flow based interpolation and sparse correspondence based morphing, and the proposed synchronization error estimation is compared with a state-of-the-art video alignment method. The proposed method shows better synchronization error estimation accuracy and compensation ability, especially in cases of low-texture, low-feature images. The effect of using data with synchronization errors is also demonstrated, as is the improvement gained by using compensated data. The compensation of synchronization errors is useful in scenarios where the recorded data is expected to be used by other processes that expect a sub-frame synchronization accuracy, such as depth-image-based rendering.
  •  
26.
  • Dima, Elijs, 1990-, et al. (författare)
  • Joint effects of depth‑aiding augmentations and viewing positionson the quality of experience in augmented telepresence
  • 2020
  • Ingår i: Quality and User Experience. - Switzerland : Springer Nature. - 2366-0139 .- 2366-0147. ; 5
  • Tidskriftsartikel (refereegranskat)abstract
    • Virtual and augmented reality is increasingly prevalent in industrial applications, such as remote control of industrial machinery,due to recent advances in head-mounted display technologies and low-latency communications via 5G. However, theinfluence of augmentations and camera placement-based viewing positions on operator performance in telepresence systemsremains unknown. In this paper, we investigate the joint effects of depth-aiding augmentations and viewing positionson the quality of experience for operators in augmented telepresence systems. A study was conducted with 27 non-expertparticipants using a real-time augmented telepresence system to perform a remote-controlled navigation and positioningtask, with varied depth-aiding augmentations and viewing positions. The resulting quality of experience was analyzed viaLikert opinion scales, task performance measurements, and simulator sickness evaluation. Results suggest that reducing thereliance on stereoscopic depth perception via camera placement has a significant benefit to operator performance and qualityof experience. Conversely, the depth-aiding augmentations can partly mitigate the negative effects of inferior viewingpositions. However the viewing-position based monoscopic and stereoscopic depth cues tend to dominate over cues basedon augmentations. There is also a discrepancy between the participants’ subjective opinions on augmentation helpfulness,and its observed effects on positioning task performance.
  •  
27.
  • Dima, Elijs, et al. (författare)
  • LIFE: A Flexible Testbed For Light Field Evaluation
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • Recording and imaging the 3D world has led to the use of light fields. Capturing, distributing and presenting light field data is challenging, and requires an evaluation platform. We define a framework for real-time processing, and present the design and implementation of a light field evaluation system. In order to serve as a testbed, the system is designed to be flexible, scalable, and able to model various end-to-end light field systems. This flexibility is achieved by encapsulating processes and devices in discrete framework systems. The modular capture system supports multiple camera types, general-purpose data processing, and streaming to network interfaces. The cloud system allows for parallel transcoding and distribution of streams. The presentation system encapsulates rendering and display specifics. The real-time ability was tested in a latency measurement; the capture and presentation systems process and stream frames within a 40 ms limit.
  •  
28.
  • Dima, Elijs, et al. (författare)
  • Modeling Depth Uncertainty of Desynchronized Multi-Camera Systems
  • 2017
  • Ingår i: 2017 International Conference on 3D Immersion (IC3D). - : IEEE. - 9781538646557
  • Konferensbidrag (refereegranskat)abstract
    • Accurately recording motion from multiple perspectives is relevant for recording and processing immersive multi-media and virtual reality content. However, synchronization errors between multiple cameras limit the precision of scene depth reconstruction and rendering. In order to quantify this limit, a relation between camera de-synchronization, camera parameters, and scene element motion has to be identified. In this paper, a parametric ray model describing depth uncertainty is derived and adapted for the pinhole camera model. A two-camera scenario is simulated to investigate the model behavior and how camera synchronization delay, scene element speed, and camera positions affect the system's depth uncertainty. Results reveal a linear relation between synchronization error, element speed, and depth uncertainty. View convergence is shown to affect mean depth uncertainty up to a factor of 10. Results also show that depth uncertainty must be assessed on the full set of camera rays instead of a central subset.
  •  
29.
  • Dima, Elijs (författare)
  • Multi-Camera Light Field Capture : Synchronization, Calibration, Depth Uncertainty, and System Design
  • 2018
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The digital camera is the technological counterpart to the human eye, enabling the observation and recording of events in the natural world. Since modern life increasingly depends on digital systems, cameras and especially multiple-camera systems are being widely used in applications that affect our society, ranging from multimedia production and surveillance to self-driving robot localization. The rising interest in multi-camera systems is mirrored by the rising activity in Light Field research, where multi-camera systems are used to capture Light Fields - the angular and spatial information about light rays within a 3D space. The purpose of this work is to gain a more comprehensive understanding of how cameras collaborate and produce consistent data as a multi-camera system, and to build a multi-camera Light Field evaluation system. This work addresses three problems related to the process of multi-camera capture: first, whether multi-camera calibration methods can reliably estimate the true camera parameters; second, what are the consequences of synchronization errors in a multi-camera system; and third, how to ensure data consistency in a multi-camera system that records data with synchronization errors. Furthermore, this work addresses the problem of designing a flexible multi-camera system that can serve as a Light Field capture testbed.The first problem is solved by conducting a comparative assessment of widely available multi-camera calibration methods. A special dataset is recorded, giving known constraints on camera ground-truth parameters to use as reference for calibration estimates. The second problem is addressed by introducing a depth uncertainty model that links the pinhole camera model and synchronization error to the geometric error in the 3D projections of recorded data. The third problem is solved for the color-and-depth multi-camera scenario, by using a proposed estimation of the depth camera synchronization error and correction of the recorded depth maps via tensor-based interpolation. The problem of designing a Light Field capture testbed is addressed empirically, by constructing and presenting a multi-camera system based on off-the-shelf hardware and a modular software framework.The calibration assessment reveals that target-based and certain target-less calibration methods are relatively similar at estimating the true camera parameters. The results imply that for general-purpose multi-camera systems, target-less calibration is an acceptable choice. For high-accuracy scenarios, even commonly used target-based calibration approaches are insufficiently accurate. The proposed depth uncertainty model is used to show that converged multi-camera arrays are less sensitive to synchronization errors. The mean depth uncertainty of a camera system correlates to the rendered result in depth-based reprojection, as long as the camera calibration matrices are accurate. The proposed depthmap synchronization method is used to produce a consistent, synchronized color-and-depth dataset for unsynchronized recordings without altering the depthmap properties. Therefore, the method serves as a compatibility layer between unsynchronized multi-camera systems and applications that require synchronized color-and-depth data. Finally, the presented multi-camera system demonstrates a flexible, de-centralized framework where data processing is possible in the camera, in the cloud, and on the data consumer's side. The multi-camera system is able to act as a Light Field capture testbed and as a component in Light Field communication systems, because of the general-purpose computing and network connectivity support for each sensor, small sensor size, flexible mounts, hardware and software synchronization, and a segmented software framework. 
  •  
30.
  • Dima, Elijs, et al. (författare)
  • View position impact on QoE in an immersive telepresence system for remote operation
  • 2019
  • Ingår i: 2019 11th International Conference on Quality of Multimedia Experience, QoMEX 2019. - Berlin, Germany : Institute of Electrical and Electronics Engineers Inc.. - 9781538682128
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we investigate how different viewing positions affect a user's Quality of Experience (QoE) and performance in an immersive telepresence system. A QoE experiment has been conducted with 27 participants to assess the general subjective experience and the performance of remotely operating a toy excavator. Two view positions have been tested, an overhead and a ground-level view, respectively, which encourage reliance on stereoscopic depth cues to different extents for accurate operation. Results demonstrate a significant difference between ground and overhead views: the ground view increased the perceived difficulty of the task, whereas the overhead view increased the perceived accomplishment as well as the objective performance of the task. The perceived helpfulness of the overhead view was also significant according to the participants. 
  •  
31.
  •  
32.
  • Edlund, Joakim, et al. (författare)
  • Analysis of Top-Down Connections in Multi-Layered Convolutional Sparse Coding
  • 2021
  • Ingår i: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP). - : IEEE. - 9781665432887
  • Konferensbidrag (refereegranskat)abstract
    • Convolutional Neural Networks (CNNs) have been instrumental in the recent advances in machine learning, with applications to media applications. Multi-Layered Convolutional Sparse Coding (ML-CSC) based on a cascade of convolutional layers in which each layer can be approximately explained by the following layer can be seen as a biologically inspired framework. However, both CNNs and ML-CSC networks lack top-down information flows that are studied in neuroscience for understanding the mechanisms of the mammal cortex. A successful implementation of such top-down connections could lead to another leap in machine learning and media applications.%This study analyses the effects of a feedback connection on an ML-CSC network, considering trade-off between sparsity and reconstruction error, support recovery rate, and mutual coherence in trained dictionaries. We find that using the feedback connection during training impacts the mutual coherence of the dictionary in a way that the equivalence between the $l_0$- and $l_1$-norm is verified for a smaller range of sparsity values. Experimental results show that the use of feedback during training does not favour inference with feedback, in terms of sparse support recovery rates. However, when the sparsity constraints are given a lower weight, the use of feedback at inference time is beneficial, in terms of support recovery rates. 
  •  
33.
  • Gao, Shan, et al. (författare)
  • A TV regularisation sparse light field reconstruction model based on guided-filtering
  • 2022
  • Ingår i: Signal processing. Image communication. - : Elsevier BV. - 0923-5965 .- 1879-2677. ; 109
  • Tidskriftsartikel (refereegranskat)abstract
    • Obtaining and representing the 4D light field is important for a number of computer vision applications. Due to the high dimensionality, acquiring the light field directly is costly. One way to overcome this deficiency is to reconstruct the light field from a limited number of measurements. Existing approaches involve either a depth estimation process or require a large number of measurements to obtain high-quality reconstructed results. In this paper, we propose a total variation (TV) regularisation sparse model with the alternating direction method of multipliers (ADMM) based on guided filtering, which addresses this depth-dependence problem with only a few measurements. As one of the sparse optimisation methods, TV regularisation based on ADMM is well suited to solve ill-posed problems such as this. Moreover, guided filtering has good edge-preserving smoothing properties, which can be incorporated into the light field reconstruction process. Therefore, high precision light field reconstruction is established with our model. Specifically, the updated image in the iteration step contains the guidance image, and an initialiser for the least squares method using a QR factorisation (LSQR) algorithm is involved in one of the subproblems. The model outperforms other methods in both visual assessments and objective metrics – in simulation experiments from synthetic data and photographic data using produced focal stacks from light field contents – and it works well in experiments using captured focal stacks. We also show a further application for arbitrary refocusing by using the reconstructed light field.
  •  
34.
  • Gond, Manu, 1997-, et al. (författare)
  • LFSphereNet : Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image
  • 2023
  • Ingår i: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production. - New York, NY, United States : Association for Computing Machinery (ACM). - 9798400704260 ; , s. 1-10
  • Konferensbidrag (refereegranskat)abstract
    • Recent developments in immersive imaging technologies have enabled improved telepresence applications. Being fully matured in the commercial sense, omnidirectional (360-degree) content provides full vision around the camera with three degrees of freedom (3DoF). Considering the applications in real-time immersive telepresence, this paper investigates how a single omnidirectional image (ODI) can be used to extend 3DoF to 6DoF. To achieve this, we propose a fully learning-based method for spherical light field reconstruction from a single omnidirectional image. The proposed LFSphereNet utilizes two different networks: The first network learns to reconstruct the light field in cubemap projection (CMP) format given the six cube faces of an omnidirectional image and the corresponding cube face positions as input. The cubemap format implies a linear re-projection, which is more appropriate for a neural network. The second network refines the reconstructed cubemaps in equirectangular projection (ERP) format by removing cubemap border artifacts. The network learns the geometric features implicitly for both translation and zooming when an appropriate cost function is employed. Furthermore, it runs with very low inference time, which enables real-time applications. We demonstrate that LFSphereNet outperforms state-of-the-art approaches in terms of quality and speed when tested on different synthetic and real world scenes. The proposed method represents a significant step towards achieving real-time immersive remote telepresence experiences.
  •  
35.
  • Hassan, Ali, et al. (författare)
  • Light-Weight EPINET Architecture for Fast Light Field Disparity Estimation
  • 2022
  • Ingår i: Light-Weight EPINET Architecture for Fast Light Field Disparity Estimation. - Shanghai, China : IEEE Signal Processing Society. - 9781665471893 ; , s. 1-5
  • Konferensbidrag (refereegranskat)abstract
    • Recent deep learning-based light field disparity estimation algorithms require millions of parameters, which demand high computational cost and limit the model deployment. In this paper, an investigation is carried out to analyze the effect of depthwise separable convolution and ghost modules on state-of-the-art EPINET architecture for disparity estimation. Based on this investigation, four convolutional blocks are proposed to make the EPINET architecture a fast and light-weight network for disparity estimation. The experimental results exhibit that the proposed convolutional blocks have significantly reduced the computational cost of EPINET architecture by up to a factor of 3.89, while achieving comparable disparity maps on HCI Benchmark dataset.
  •  
36.
  • Jiang, Meng, 1990-, et al. (författare)
  • A Coherent Wideband Acoustic Source Localization Using a Uniform Circular Array
  • 2023
  • Ingår i: Sensors. - : MDPI. - 1424-8220. ; 23:11
  • Tidskriftsartikel (refereegranskat)abstract
    • In modern applications such as robotics, autonomous vehicles, and speaker localization, the computational power for sound source localization applications can be limited when other functionalities get more complex. In such application fields, there is a need to maintain high localization accuracy for several sound sources while reducing computational complexity. The array manifold interpolation (AMI) method applied with the Multiple Signal Classification (MUSIC) algorithm enables sound source localization of multiple sources with high accuracy. However, the computational complexity has so far been relatively high. This paper presents a modified AMI for uniform circular array (UCA) that offers reduced computational complexity compared to the original AMI. The complexity reduction is based on the proposed UCA-specific focusing matrix which eliminates the calculation of the Bessel function. The simulation comparison is done with the existing methods of iMUSIC, the Weighted Squared Test of Orthogonality of Projected Subspaces (WS-TOPS), and the original AMI. The experiment result under different scenarios shows that the proposed algorithm outperforms the original AMI method in terms of estimation accuracy and up to a 30% reduction in computation time. An advantage offered by this proposed method is the ability to implement wideband array processing on low-end microprocessors.
  •  
37.
  • Jiang, Meng, et al. (författare)
  • Performance Comparison of Omni and Cardioid Directional Microphones for Indoor Angle of Arrival Sound Source Localization
  • 2022
  • Ingår i: Conference Record - IEEE Instrumentation and Measurement Technology Conference. - : IEEE. - 9781665483605
  • Konferensbidrag (refereegranskat)abstract
    • The sound source localization technology brings the possibility of mapping the sound source positions. In this paper, angle-of-arrival (AOA) has been chosen as the method for achieving sound source localization in an indoor enclosed environment. The dynamic environment and reverberations bring a challenge for AOA-based systems for such applications. By the acknowledgement of microphone directionality, the cardioid-directional microphone systems have been chosen for the localization performance comparison with omni-directional microphone systems, in order to investigate which microphone is superior in AOA indoor sound source localization. To reduce the hardware complexity, the number of microphones used during the experiment has been limited to 4. A localization improvement has been proposed with a weighting factor. The comparison has been done for both types of microphones with 3 different array manifolds under the same system setup. The comparison shows that the cardioid-directional microphone system has an overall higher accuracy. 
  •  
38.
  • Karbalaie, Abdolamir, 1970-, et al. (författare)
  • Event detection in surveillance videos : a review
  • 2022
  • Ingår i: Multimedia tools and applications. - : Springer Nature. - 1380-7501 .- 1573-7721. ; 81:24, s. 35463-35501
  • Tidskriftsartikel (refereegranskat)abstract
    • Since 2008, a variety of systems have been designed to detect events in security cameras. There are also more than a hundred journal articles and conference papers published in this field. However, no survey has focused on recognizing events in the surveillance system. Thus, motivated us to provide a comprehensive review of the different developed event detection systems. We start our discussion with the pioneering methods that used the TRECVid-SED dataset and then developed methods using VIRAT dataset in TRECVid evaluation. To better understand the designed systems, we describe the components of each method and the modifications of the existing method separately. We have outlined the significant challenges related to untrimmed security video action detection. Suitable metrics are also presented for assessing the performance of the proposed models. Our study indicated that the majority of researchers classified events into two groups on the basis of the number of participants and the duration of the event for the TRECVid-SED Dataset. Depending on the group of events, one or more models to identify all the events were used. For the VIRAT dataset, object detection models to localize the first stage activities were used throughout the work. Except one study, a 3D convolutional neural network (3D-CNN) to extract Spatio-temporal features or classifying different activities were used. From the review that has been carried, it is possible to conclude that developing an automatic surveillance event detection system requires three factors: accurate and fast object detection in the first stage to localize the activities, and classification model to draw some conclusion from the input values.
  •  
39.
  •  
40.
  • Li, Yongwei, et al. (författare)
  • An analysis of demosaicing for plenoptic capture based on ray optics
  • 2018
  • Ingår i: Proceedings of 3DTV Conference 2018. - 9781538661253
  • Konferensbidrag (refereegranskat)abstract
    • The plenoptic camera is gaining more and more attention as it capturesthe 4D light field of a scene with a single shot and enablesa wide range of post-processing applications. However, the preprocessing steps for captured raw data, such as demosaicing, have been overlooked. Most existing decoding pipelines for plenoptic cameras still apply demosaicing schemes which are developed for conventional cameras. In this paper, we analyze the sampling pattern of microlens-based plenoptic cameras by ray-tracing techniques and ray phase space analysis. The goal of this work is to demonstrate guidelines and principles for demosaicing the plenoptic captures by taking the unique microlens array design into account. We show that the sampling of the plenoptic camera behaves differently from that of a conventional camera and the desired demosaicing scheme is depth-dependent.
  •  
41.
  • Li, Yongwei, et al. (författare)
  • Area-Based Depth Estimation for Monochromatic Feature-Sparse Orthographic Capture
  • 2018
  • Ingår i: 2018 26th European Signal Processing Conference (EUSIPCO). - : IEEE conference proceedings. ; , s. 206-210
  • Konferensbidrag (refereegranskat)abstract
    • With the rapid development of light field technology, depth estimation has been highlighted as one of the critical problems in the field, and a number of approaches have been proposed to extract the depth of the scene. However, depthestimation by stereo matching becomes difficult and unreliable when the captured images lack both color and feature information. In this paper, we propose a scheme that extracts robust depth from monochromatic, feature-sparse scenes recorded in orthographic sub-aperture images. Unlike approaches which relyon the rich color and texture information across the sub-aperture views, our approach is based on depth from focus techniques. First, we superimpose shifted sub-aperture images on top of anarbitrarily chosen central image. To focus on different depths, the shift amount is varied based on the micro-lens array properties. Next, an area-based depth estimation approach is applied tofind the best match among the focal stack and generate the dense depth map. This process is repeated for each sub-aperture image. Finally, occlusions are handled by merging depth maps generated from different central images followed by a voting process. Results show that the proposed scheme is more suitable than conventional depth estimation approaches in the context of orthographic captures that have insufficient color and feature information, such as microscopic fluorescence imaging.
  •  
42.
  • Li, Yun, et al. (författare)
  • Coding of focused plenoptic contents by displacement intra prediction
  • 2016
  • Ingår i: IEEE transactions on circuits and systems for video technology (Print). - 1051-8215 .- 1558-2205. ; 26:7, s. 1308-1319
  • Tidskriftsartikel (refereegranskat)abstract
    • A light field is commonly described by a two-plane representation with four dimensions. Refocused three-dimensional contents can be rendered from light field images. A method for capturing these images is by using cameras with microlens arrays. A dense sampling of the light field results in large amounts of redundant data. Therefore, an efficient compression is vital for a practical use of these data. In this paper, we propose a displacement intra prediction scheme with a maximum of two hypotheses for the compression of plenoptic contents from focused plenoptic cameras. The proposed scheme is further implemented into HEVC. The work is aiming at coding plenoptic captured contents efficiently without knowing underlying camera geometries. In addition, the theoretical analysis of the displacement intra prediction for plenoptic images is explained; the relationship between the compressed captured images and their rendered quality is also analyzed. Evaluation results show that plenoptic contents can be efficiently compressed by the proposed scheme. Bit rate reduction up to 60 percent over HEVC is obtained for plenoptic images, and more than 30 percent is achieved for the tested video sequences.
  •  
43.
  • Li, Yun, et al. (författare)
  • Coding of plenoptic images by using a sparse set and disparities
  • 2015
  • Ingår i: Proceedings - IEEE International Conference on Multimedia and Expo. - : IEEE conference proceedings. - 9781479970827 ; , s. -Art. no. 7177510
  • Konferensbidrag (refereegranskat)abstract
    • A focused plenoptic camera not only captures the spatial information of a scene but also the angular information. The capturing results in a plenoptic image consisting of multiple microlens images and with a large resolution. In addition, the microlens images are similar to their neighbors. Therefore, an efficient compression method that utilizes this pattern of similarity can reduce coding bit rate and further facilitate the usage of the images. In this paper, we propose an approach for coding of focused plenoptic images by using a representation, which consists of a sparse plenoptic image set and disparities. Based on this representation, a reconstruction method by using interpolation and inpainting is devised to reconstruct the original plenoptic image. As a consequence, instead of coding the original image directly, we encode the sparse image set plus the disparity maps and use the reconstructed image as a prediction reference to encode the original image. The results show that the proposed scheme performs better than HEVC intra with more than 5 dB PSNR or over 60 percent bit rate reduction.
  •  
44.
  • Li, Yun, et al. (författare)
  • Compression of Unfocused Plenoptic Images using a Displacement Intra prediction
  • 2016
  • Ingår i: 2016 IEEE International Conference on Multimedia and Expo Workshop, ICMEW 2016. - : IEEE Signal Processing Society. - 9781509015528
  • Konferensbidrag (refereegranskat)abstract
    • Plenoptic images are one type of light field contents produced by using a combination of a conventional camera and an additional optical component in the form of microlens arrays, which are positioned in front of the image sensor surface. This camera setup can capture a sub-sampling of the light field with high spatial fidelity over a small range, and with a more coarsely sampled angle range. The earliest applications that leverage on the plenoptic image content is image refocusing, non-linear distribution of out-of-focus areas, SNR vs. resolution trade-offs, and 3D-image creation. All functionalities are provided by using post-processing methods. In this work, we evaluate a compression method that we previously proposed for a different type of plenoptic image (focused or plenoptic camera 2.0 contents) than the unfocused or plenoptic camera 1.0 that is used in this Grand Challenge. The method is an extension of the state-of-the-art video compression standard HEVC where we have brought the capability of bi-directional inter-frame prediction into the spatial prediction. The method is evaluated according to the scheme set out by the Grand Challenge, and the results show a high compression efficiency compared with JPEG, i.e., up to 6 dB improvements for the tested images.
  •  
45.
  • Li, Yongwei, 1990- (författare)
  • Computational Light Field Photography : Depth Estimation, Demosaicing, and Super-Resolution
  • 2020
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The transition of camera technology from film-based cameras to digital cameras has been witnessed in the past twenty years, along with impressive technological advances in processing massively digitized media content. Today, a new evolution emerged -- the migration from 2D content to immersive perception. This rising trend has a profound and long-term impact to our society, fostering technologies such as teleconferencing and remote surgery. The trend is also reflected in the scientific research community, and more intention has been drawn to the light field and its applications. The purpose of this dissertation is to develop a better understanding of light field structure by analyzing its sampling behavior and to addresses three problems concerning the light field processing pipeline: 1) How to address the depth estimation problem when there is limited color and texture information. 2) How to improve the rendered image quality by using the inherent depth information. 3) How to solve the interdependence conflict of demosaicing and depth estimation. The first problem is solved by a hybrid depth estimation approach that combines advantages of correspondence matching and depth-from-focus, where occlusion is handled by involving multiple depth maps in a voting scheme. The second problem is divided into two specific tasks -- demosaicing and super-resolution, where depth-assisted light field analysis is employed to surpass the competence of traditional image processing. The third problem is tackled with an inferential graph model that encodes the connections between demosaicing and depth estimation explicitly, and jointly performs a global optimization for both tasks. The proposed depth estimation approach shows a noticeable improvement in point clouds and depth maps, compared with references methods. Furthermore, the objective metrics and visual quality are compared with classical sensor-based demosaicing and multi-image super-resolution to show the effectiveness of the proposed depth-assisted light field processing methods. Finally, a multi-task graph model is proposed to challenge the performance of the sequential light field image processing pipeline. The proposed method is validated with various kinds of light fields, and outperforms the state-of-the-art in both demosaicing and depth estimation tasks. The works presented in this dissertation raise a novel view of the light field data structure in general, and provide tools to solve image processing problems in specific. The impact of the outcome can be manifold: To support scientific research with light field microscopes, to stabilize the performance of range cameras for industrial applications, as well as to provide individuals with a high-quality immersive experience.
  •  
46.
  • Li, Yongwei, et al. (författare)
  • Depth-Assisted Demosaicing for Light Field Data in Layered Object Space
  • 2019
  • Ingår i: 2019 IEEE International Conference on Image Processing (ICIP). - : IEEE. - 9781538662496 ; , s. 3746-3750
  • Konferensbidrag (refereegranskat)abstract
    • Light field technology, which emerged as a solution to the increasing demands of visually immersive experience, has shown its extraordinary potential for scene content representation and reconstruction. Unlike conventional photography that maps the 3D scenery onto a 2D plane by a projective transformation, light field preserves both the spatial and angular information, enabling further processing steps such as computational refocusing and image-based rendering. However, there are still gaps that have been barely studied, such as the light field demosaicing process. In this paper, we propose a depth-assisted demosaicing method for light field data. First, we exploit the sampling geometry of the light field data with respect to the scene content using the ray-tracing technique and develop a sampling model of light field capture. Then we carry out the demosaicing process in a layered object space with object-space sampling adjacencies rather than pixel placement. Finally, we compare our results with state-of-art approaches and discuss about the potential research directions of the proposed sampling model to show the significance of our approach.
  •  
47.
  • Li, Yongwei, 1990-, et al. (författare)
  • Depth-Assisted Light Field Super-Resolution in Layered Object Space
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • The captured light field may fail to reconstruct fine details of the scene due to under-sampling problem of lightfield acquisition devices. Therefore,super-resolution is required to restore high-frequency information from the light field and to improve the quality of therendered views. Conventionalsuper-resolution algorithms are not ideal for light field data, as they do not utilize the full potential of light field 4D structure, while existing light fieldsuper-resolution algorithms rely heavily on the accuracy of the estimated depth and perform complex sub-pixeldisparity estimation. In this paper, we propose a new light field super-resolution algorithm which can address depthuncertainty with a layered object space. First, a pixel-wise depth estimation is performed from the resampled views.Then we divide the depth range into finite layers and back-project pixels onto these layers in order to address the sub-pixel depth error. Finally, two super-resolution schemes: in-depth warping and cross-depth learning, are introduced tosuper-resolve the views from light field data redundancy. The algorithms is tested with extensive datasets, and theresults show that our method attains favorable results in both visual assessment and objective metrics compared toother light field super-resolution methods.
  •  
48.
  • Li, Yun, et al. (författare)
  • Depth Map Compression with Diffusion Modes in 3D-HEVC
  • 2013
  • Ingår i: MMEDIA 2013 - 5th International Conferences on Advances in Multimedia. - : International Academy, Research and Industry Association (IARIA). - 9781612082653 ; , s. 125-129
  • Konferensbidrag (refereegranskat)abstract
    • For three-dimensional television, multiple views can be generated by using the Multi-view Video plus Depth (MVD) format. The depth maps of this format can be compressed efficiently by the 3D extension of High Efficiency Video Coding (3D-HEVC), which has explored the correlations between its two components, texture and associated depth map. In this paper, we introduce two modes for depth map coding into HEVC, where the modes use diffusion. The framework for inter-component prediction of Depth Modeling Modes (DMM) is utilized for the proposed modes. They detect edges from textures and then diffuse an entire block from known adjacent blocks by using Laplace equation constrained by the detected edges. The experimental results show that depth maps can be compressed more efficiently with the proposed diffusion modes, where the bit rate saving can reach 1.25 percentage of the total depth bit rate with a constant quality of synthesized views.
  •  
49.
  • Li, Yi -Hsin, et al. (författare)
  • Segmentation-based Initialization for Steered Mixture of Experts
  • 2023
  • Ingår i: 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP). - : IEEE conference proceedings. - 9798350359855
  • Konferensbidrag (refereegranskat)abstract
    • The Steered-Mixture-of-Experts (SMoE) model is an edge-Aware kernel representation that has successfully been explored for the compression of images, video, and higher-dimensional data such as light fields. The present work aims to leverage the potential for enhanced compression gains through efficient kernel reduction. We propose a fast segmentation-based strategy to identify a sufficient number of kernels for representing an image and giving initial kernel parametrization. The strategy implies both reduced memory footprint and reduced computational complexity for the subsequent parameter optimization, resulting in an overall faster processing time. Fewer kernels, when combined with the inherent sparsity of the SMoEs, further enhance the overall compression performance. Empirical evaluations demonstrate a gain of 0.3-1.0 dB in PSNR for a constant number of kernels, and the use of 23 % less kernels and 25 % less time for constant PSNR. The results highlight the feasibility and practicality of the approach, positioning it as a valuable solution for various image-related applications, including image compression. 
  •  
50.
  • Li, Yun, et al. (författare)
  • Scalable coding of plenoptic images by using a sparse set and disparities
  • 2016
  • Ingår i: IEEE Transactions on Image Processing. - 1057-7149 .- 1941-0042. ; 25:1, s. 80-91
  • Tidskriftsartikel (refereegranskat)abstract
    • One of the light field capturing techniques is the focused plenoptic capturing. By placing a microlens array in front of the photosensor, the focused plenoptic cameras capture both spatial and angular information of a scene in each microlens image and across microlens images. The capturing results in significant amount of redundant information, and the captured image is usually of a large resolution. A coding scheme that removes the redundancy before coding can be of advantage for efficient compression, transmission and rendering. In this paper, we propose a lossy coding scheme to efficiently represent plenoptic images. The format contains a sparse image set and its associated disparities. The reconstruction is performed by disparity-based interpolation and inpainting, and the reconstructed image is later employed as a prediction reference for the coding of the full plenoptic image. As an outcome of the representation, the proposed scheme inherits a scalable structure with three layers.The results show that plenoptic images are compressed efficiently with over 60 percent bit rate reduction compared to HEVC intra, and with over 20 percent compared to HEVC block copying mode.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 80
Typ av publikation
konferensbidrag (46)
tidskriftsartikel (20)
annan publikation (4)
rapport (3)
doktorsavhandling (3)
bokkapitel (2)
visa fler...
licentiatavhandling (1)
patent (1)
visa färre...
Typ av innehåll
refereegranskat (67)
övrigt vetenskapligt/konstnärligt (9)
populärvet., debatt m.m. (2)
Författare/redaktör
Sjöström, Mårten, 19 ... (79)
Olsson, Roger, 1973- (43)
Ahmad, Waqas (11)
Brunnström, Kjell, 1 ... (8)
Dima, Elijs (8)
Li, Yun (6)
visa fler...
Andersson, Mattias, ... (5)
Jennehag, Ulf (4)
Guillemot, Christine (4)
Li, Yongwei, 1990- (4)
Koch, Reinhard (3)
Vagharshakyan, Suren (3)
Gotchev, Atanas (3)
Bregovic, Robert (3)
Tourancheau, Sylvain ... (3)
Boström, Lena, 1960- (3)
Edlund, Joakim (3)
Martinez Corral, Man ... (3)
Dima, Elijs, 1990- (3)
Li, Yongwei (3)
Schwarz, Sebastian, ... (3)
Ericson, Thomas (2)
Nnonyelu, Chibuzo Jo ... (2)
Lundgren, Jan, 1977- (2)
Persson, Anders (2)
Olsson, Roger (2)
Hassan, Ali (2)
Palmieri, Luca (2)
Le Callet, Patrick (2)
Thungström, Göran, 1 ... (2)
Andrén, Börje (2)
Brunnström, Kjell (2)
Wang, Kun (2)
Johanson, Mathias (2)
Qureshi, T (2)
Qureshi, Tahir (2)
Johanson, M. (2)
Conti, Caroline (2)
Soares, Luis Ducla (2)
Nunes, Paulo (2)
Perra, Cristian (2)
Damghanian, Mitra (2)
Damghanian, Mitra, 1 ... (2)
Navarro Fructuoso, H ... (2)
Gao, Shan (2)
Knorr, Sebastian (2)
Battisti, Federica (2)
Carli, Marco (2)
Rafiei, Shirin (2)
Pla, Filiberto (2)
visa färre...
Lärosäte
Mittuniversitetet (80)
RISE (7)
Kungliga Tekniska Högskolan (5)
Karolinska Institutet (1)
Blekinge Tekniska Högskola (1)
Språk
Engelska (77)
Svenska (3)
Forskningsämne (UKÄ/SCB)
Teknik (48)
Naturvetenskap (46)
Samhällsvetenskap (3)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy