SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Plosila Juha) "

Search: WFRF:(Plosila Juha)

  • Result 1-10 of 76
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Anwar, Hassan, et al. (author)
  • Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures
  • 2014
  • In: ACM International Conference Proceeding Series. - New York, NY, USA : ACM. - 9781450328227 ; , s. 64-67
  • Conference paper (peer-reviewed)abstract
    • Today, reconfigurable architectures are becoming increas- ingly popular as the candidate platforms for neural net- works. Existing works, that map neural networks on re- configurable architectures, only address either FPGAs or Networks-on-chip, without any reference to the Coarse-Grain Reconfigurable Architectures (CGRAs). In this paper we investigate the overheads imposed by implementing spiking neural networks on a Coarse Grained Reconfigurable Ar- chitecture (CGRAs). Experimental results (using point to point connectivity) reveal that up to 1000 neurons can be connected, with an average response time of 4.4 msec.
  •  
2.
  • Carlsson, Jonas, 1972- (author)
  • Contributions to Asynchronous Communication Ports for GALS Systems
  • 2006
  • Doctoral thesis (other academic/artistic)abstract
    • Digital systems commonly use a single global clock signal to synchronize the whole system. This is not always possible and it can be more advantageously to divide the system into separate clock domains, where each clock domain can operate with its own clock frequency. Communication between the different clock domains are not trivial and must be handled with care. Several schemes can be used depending on the relation between the clock frequencies of the communicating clock domains. This thesis focuses on the Globally Asynchronous Locally Synchronous (GALS) scheme, in which all communications between clock domains are handled using dedicated communication channels. These communication channels use asynchronous handshaking protocols to transfer information between clock domains. No global clock signal is used and the clock signal is instead local for each clock domain.An efficient design flow for GALS system has been developed, which allows a designer to implement GALS systems without prior knowledge of asynchronous circuits. The GALS design flow starts with a high-level model of the system behavior and ends with an implementation in an FPGA or an ASIC. The design flow can also increase the design efficiency for GALS system since the flow alleviates the design and placement of the asynchronous circuits for the designer. A tool that handles the asynchronous circuits in the design flow has been developed.Two types of communication ports have been developed to handle the communication between clock domains. Both of these ports can be used in systems with static schedule or dynamic schedule of transactions. One of the communication ports can easily be migrated to a new CMOS process, since it only uses standard-cells that care provided by most vendors of CMOS processes. A clock gating circuit has been developed to allow a clock domain to use an external stable clock signal to create an internal stoppable clock signal. A stoppable local clock is used to eliminate problems with metastability when transferring data between clock domains with arbitrary clock frequencies.In order to validate the design flow and proposed circuitry, has an integrated circuit for 2-dimensional Discrete Cosine Transform been implemented using the GALS scheme and one of the proposed communication ports. The circuit has been implemented using a standard-cell library in a 0.35 mm CMOS process. A few possible improvements to the implementation are also discussed in the thesis.The GALS design flow with the asynchronous wrapper generation tool has been used to implement the digital baseband processing in the physical layer of the IEEE 802.11a transmitter. The transmitter is built using multiple clock domains. The transmitter has been implemented and tested in a Stratix II FPGA.
  •  
3.
  • Carlsson, Jonas, 1972- (author)
  • Studies on asynchronous communication ports for GALS systems
  • 2005
  • Licentiate thesis (other academic/artistic)abstract
    • Digital systems generally use a global clock signal for the whole system. A System-on-Chip may have to communicate with the environment, using several different data rates that does not fit well to the single global clock frequency. When designing a digital system, it might be beneficial to divide the system into different clock domains where each domain can operate with its own clock frequency.In this thesis, various clocking schemes are discussed. The synchronous clocking schemes that are discussed are mesochronous, plesiochronous, rational, oversampling and arbitrary clocking schemes.The thesis focuses on the Globally Asynchronous Locally Synchronous scheme. This scheme transfers information between the different clock domains through dedicated communication channels. These communication channels use asynchronous handshaking protocols to transfer information without the necessity for a clock.A communication channel consists of a transmitting and receiving port. Two types of communication ports are proposed in the thesis. The communication ports can be used either in a system with a static schedule or dynamic schedule of transactions. One of the ports can easily be implemented in different CMOS processes, since it only uses standard cells that can be found in most existing CMOS processes standard library.A 2-dimensional Discrete Cosine Transform has been implemented using the GALS scheme and one of the proposed communication ports. The 2-D DCT has been implemented using a standard cell library supplied by AMS fora 0.35 µm CMOS process. A few improvements to the implementation are also discussed in the thesis.
  •  
4.
  • Daneshtalab, Masoud, et al. (author)
  • In-order delivery approach for 2D and 3D NoCs
  • 2015
  • In: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 71:8, s. 2877-2899
  • Journal article (peer-reviewed)abstract
    • In many applications, it is critical to guarantee the in-order delivery of requests from the master cores to the slave cores, so that the requests can be executed in the correct order without requiring buffers. Since in NoCs packets may use different paths and on the other hand traffic congestion varies on different routes, the in-order delivery constraint cannot be met without support. To guarantee the in-order delivery, traditional approaches either use dimension-order routing or employ reordering buffers at network interfaces. Dimension-order routing degrades the performance considerably while the usage of reordering buffers imposes large area overhead. In this paper, we present a mechanism allowing packets to be routed through multiple paths in the network, helping to balance the traffic load while guaranteeing the in-order delivery. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. The simple idea is to use different deterministic algorithms for independent flows. This approach neither requires reordering buffers nor limits packets to use a single path. The algorithm is simple and practical with negligible area overhead over dimension-order routing. The concept is investigated in both 2D and 3D mesh networks.
  •  
5.
  • Daneshtalab, Masoud, et al. (author)
  • Special issue on many-core embedded systems
  • 2014
  • In: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:6, s. 525-525
  • Journal article (other academic/artistic)
  •  
6.
  • Dytckov, Sergei, et al. (author)
  • Efficient STDP Micro-Architecture for Silicon Spiking Neural Networks
  • 2014
  • In: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD). - 9781479957934 ; , s. 496-503
  • Conference paper (peer-reviewed)abstract
    • Spiking neural networks (SNNs) are the closest approach to biological neurons in comparison with conventional artificial neural networks (ANN). SNNs are composed of neurons and synapses which are interconnected with a complex pattern. As communication in such massively parallel computational systems is getting critical, the network-on-chip (NoC) becomes a promising solution to provide a scalable and robust interconnection fabric. However, using NoC for large-scale SNNs arises a trade-off between scalability, throughput, neuron/router ratio (cluster size), and area overhead. In this paper, we tackle the trade-off using a clustering approach and try to optimize the synaptic resource utilization. An optimal cluster size can provide the lowest area overhead and power consumption. For the learning purposes, a phenomenon known as spike-timing-dependent plasticity (STDP) is utilized. The micro-architectures of the network, clusters, and the computational neurons are also described. The presented approach suggests a promising solution of integrating NoCs and STDP-based SNNs for the optimal performance based on the underlying application.
  •  
7.
  • Ebrahimi, Masoumeh, et al. (author)
  • Fault-tolerant routing algorithm for 3D NoC using hamiltonian path strategy
  • 2013
  • In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013. ; , s. 1601-1604
  • Conference paper (peer-reviewed)abstract
    • While Networks-on-Chip (NoC) have been increasing in popularity with industry and academia, it is threatened by the decreasing reliability of aggressively scaled transistors. In this paper, we address the problem of faulty elements by the means of routing algorithms. Commonly, fault-tolerant algorithms are complex due to supporting different fault models while preventing deadlock. When moving from 2D to 3D network, the complexity increases significantly due to the possibility of creating cycles within and between layers. In this paper, we take advantages of the Hamiltonian path to tolerate faults in the network. The presented approach is not only very simple but also able to support almost all one-faulty unidirectional links in 2D and 3D NoCs.
  •  
8.
  • Ebrahimi, Masoumeh, et al. (author)
  • In-Order Delivery Approach for 3D NoCs
  • 2013
  • In: 2013 17TH CSI INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS 2013). - : IEEE. - 9781479905621 ; , s. 87-
  • Conference paper (peer-reviewed)abstract
    • Routing algorithms can be classified into deterministic and adaptive methods. In deterministic methods, a single path is selected for each pair of source and destination nodes, and thus they are unable to distribute the traffic load over the network. Using deterministic routing, packets reach a destination in the same order they are delivered from a source node. Adaptive routing algorithms can greatly improve the performance by distributing packets over different routes. However, it requires a mechanism to reorder packets at destinations. Thereby, a large reordering buffer and a complex control mechanism are required at each node. This motivated us to propose a method guaranteeing in-order delivery while sending packets through alternative paths. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. We introduce several routing algorithms working together in the network without creating cycles. By using these algorithms, packets of different flows use different routes while packets belonging to the same flow follow a single path. In this way, traffic is distributed over the network while addressing in-order delivery. We employ this approach on three-dimensional Networks-on-Chip.
  •  
9.
  • Ebrahimi, Masoumeh, et al. (author)
  • Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing
  • 2014
  • In: IEEE Transactions on Computers. - 0018-9340 .- 1557-9956. ; 63:3, s. 718-733
  • Journal article (peer-reviewed)abstract
    • Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in ChipMultiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and invarious parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at thehardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs,each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore theefficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose theMinimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show thatan advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the networkuntil all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsetsand the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performanceimprovement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent averageand 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.
  •  
10.
  • Farahnakian, Fahimeh, et al. (author)
  • Adaptive Load Balancing in Learning-based Approaches for Many-core Embedded Systems
  • 2014
  • In: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 68:3, s. 1214-1234
  • Journal article (peer-reviewed)abstract
    • Adaptive routing algorithms improve network performance by distributingtraffic over the whole network. However, they require congestion information to facilitateload balancing. To provide local and global congestion information, we proposea learning method based on dual reinforcement learning approach. This informationcan be dynamically updated according to the changing traffic condition in the networkby propagating data and learning packets. We utilize a congestion detection methodwhich updates the learning rate according to the congestion level. This method calculatesthe average number of free buffer slots in each switch at specific time intervalsand compares it with maximum and minimum values. Based on the comparison result,the learning rate sets to a value between 0 and 1. If a switch gets congested, the learningrate is set to a high value, meaning that the global information is more important thanlocal. In contrast, local is more emphasized than global information in non-congestedswitches. Results show that the proposed approach achieves a significant performanceimprovement over the traditional Q-routing, DRQ-routing, DBAR and Dynamic XYalgorithms.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 76
Type of publication
conference paper (50)
journal article (20)
doctoral thesis (4)
artistic work (3)
licentiate thesis (2)
Type of content
peer-reviewed (66)
other academic/artistic (8)
pop. science, debate, etc. (2)
Author/Editor
Plosila, Juha (72)
Tenhunen, Hannu (57)
Hemani, Ahmed (20)
Daneshtalab, Masoud (20)
Liljeberg, Pasi (17)
Jafri, Syed Mohammad ... (15)
show more...
Paul, Kolin (14)
Ebrahimi, Masoumeh (12)
Rahmani, Amir-Mohamm ... (12)
Kelati, Amleset (9)
Majd, Amin (6)
Jafri, Syed M. A. H. (5)
Ellervee, Peeter (4)
Plosila, Juha, Profe ... (4)
Dytckov, Sergei (4)
Guang, Liang (4)
Farahini, Nasim (3)
Tajammul, Muhammad A ... (3)
Mubeen, Saad (2)
Anwar, Hassan (2)
Yang, Bo (2)
Bruhn, Fredrik (2)
Tsog, Nandinbaatar (2)
Carlsson, Jonas, 197 ... (2)
Farahnakian, Fahimeh (2)
Abbas, N (1)
Oelmann, Bengt (1)
Behnam, Moris (1)
Seoane, Fernando, 19 ... (1)
Behnam, Moris, 1973- (1)
Sjödin, Mikael (1)
Öberg, Johnny (1)
Zheng, Li-Rong (1)
Iqbal, J. (1)
Jantsch, Axel (1)
Westerlund, Tomi (1)
Mvungi, Nerey (1)
Sergei, Dytckov (1)
Moghaddami Khalilzad ... (1)
Troubitsyna, Elena (1)
Palesi, Maurizio (1)
Kondoro, Aron (1)
Ben Dhaou, Imed (1)
Gia, T. N. (1)
Kakakhel, Syed Ramee ... (1)
Nolin, Mikael, 1971- (1)
Mohammadi, Siamak (1)
Chang, Xin (1)
Flich, Jose (1)
Jafri, Syed (1)
show less...
University
Royal Institute of Technology (71)
Mälardalen University (4)
Linköping University (2)
Language
English (74)
Swedish (2)
Research subject (UKÄ/SCB)
Engineering and Technology (53)
Natural sciences (17)
Medical and Health Sciences (1)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view