Author image not provided
 Per Stenström

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article11.32
Citation Count1,449
Publication count128
Publication years1987-2016
Available for download44
Average downloads per article443.75
Downloads (cumulative)19,525
Downloads (12 Months)1,364
Downloads (6 Weeks)185
ACM Fellow
SEARCH
ROLE
Arrow RightAuthor only
· Editor only
· Other only
· All roles


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


128 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 128
Result page: 1 2 3 4 5 6 7

Sort by:

1 published by ACM
Adaptive Row Addressing for Cost-Efficient Parallel Memory Protocols in Large-Capacity Memories
Dmitry Knyaginin, Vassilis Papaefstathiou, Per Stenstrom
October 2016 MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 10,   Downloads (12 Months): 49,   Downloads (Overall): 49

Full text available: PDFPDF
Modern commercial workloads drive a continuous demand for larger and still low-latency main memories. JEDEC member companies indicate that parallel memory protocols will remain key to such memories, though widening the bus (increasing the pin count) to address larger capacities would cause multiple issues ultimately reducing the speed (the peak ...
Keywords: Parallel memory protocols, energy, fairness, memory-request scheduling, row-address width, address bus, performance

2
EUROSERVER: share-anything scale-out micro-server design
Manolis Marazakis, John Goodacre, Didier Fuin, Paul Carpenter, John Thomson, Emil Matus, Antimo Bruno, Per Stenstrom, Jerome Martin, Yves Durand, Isabelle Dor
March 2016 DATE '16: Proceedings of the 2016 Conference on Design, Automation & Test in Europe
Publisher: EDA Consortium
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 1,   Downloads (Overall): 1

Full text available: PDFPDF
This paper provides a snapshot summary of the trends in the area of micro-server development and their application in the broader enterprise and cloud markets. Focusing on the technology aspects, we provide an understanding of these trends and specifically the differentiation and uniqueness of the approach being adopted by the ...

3 published by ACM
HyComp: a hybrid cache compression method for selection of data-type-specific compression methods
Angelos Arelakis, Fredrik Dahlgren, Per Stenstrom
December 2015 MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 19,   Downloads (12 Months): 280,   Downloads (Overall): 502

Full text available: PDFPDF
Proposed cache compression schemes make design-time assumptions on value locality to reduce decompression latency. For example, some schemes assume that common values are spatially close whereas other schemes assume that null blocks are common. Most schemes, however, assume that value locality is best exploited by fixed-size data types (e.g., 32-bit ...
Keywords: floating-point data, hybrid compression, value locality, cache compression, huffman coding

4
Crystal: A Design-Time Resource Partitioning Method for Hybrid Main Memory
Dmitry Knyaginin, Georgi N. Gaydadjiev, Per Stenstrom
September 2014 BRACIS '14: Proceedings of the 2014 Brazilian Conference on Intelligent Systems
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Non-Volatile Memory (NVM) technologies can be used to reduce system-level execution time, energy, or cost but they add a new design dimension. Finding the best amounts of DRAM and NVM in hybrid main memory systems is a nontrivial design-time issue, the best solution to which depends on many factors. Such ...

5
Characterizing and Exploiting Small-Value Memory Instructions
Mafijul Md. Islam, Per Stenstrom
July 2014 IEEE Transactions on Computers: Volume 63 Issue 7, July 2014
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

This paper exploits small-value locality to accelerate the execution of memory instructions. We find that small-value loads—loads with small-value operands of 8 bits or less—are common across 52 applications from the desktop, embedded, and media domains. We show that the relative occurrences of small-value loads remain fairly stable during the program ...

6
SC2: a statistical compression cache scheme
Angelos Arelakis, Per Stenstrom
June 2014 ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
Publisher: IEEE Press
Bibliometrics:
Citation Count: 12
Downloads (6 Weeks): 8,   Downloads (12 Months): 131,   Downloads (Overall): 545

Full text available: PDFPDF
Low utilization of on-chip cache capacity limits performance and wastes energy because of the long latency, limited bandwidth, and energy consumption associated with off-chip memory accesses. Value replication is an important source of low capacity utilization. While prior cache compression techniques manage to code frequent values densely, they trade off ...
Also published in:
October 2014  ACM SIGARCH Computer Architecture News - ISCA '14: Volume 42 Issue 3, June 2014

7
ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory
Ruben Titos-Gil, Anurag Negi, Manuel E. Acacio, Jose M. Garcia, Per Stenstrom
May 2014 IEEE Transactions on Parallel and Distributed Systems: Volume 25 Issue 5, May 2014
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0

Transactional contention management policies show considerable variation in relative performance with changing workload characteristics. Consequently, incorporation of fixed-policy Transactional Memory (TM) in general purpose computing systems is suboptimal by design and renders such systems susceptible to pathologies. Of particular concern are Hardware TM (HTM) systems where traditional designs have hardwired ...

8
Effective resource management towards efficient computing
Per Stenström
March 2014 DATE '14: Proceedings of the conference on Design, Automation & Test in Europe
Publisher: European Design and Automation Association
Bibliometrics:
Citation Count: 0

Improving performance of computers at historical rates, as dictated by Moore's Law, is becoming increasingly more challenging especially because we are hitting the chip power-budget wall. But challenges usually direct us to focus on opportunities we have neglected in the past. I will focus on some of these overlooked opportunities ...

9
M. M. Waliullah, Per Stenstrom
February 2014 International Journal of Parallel Programming: Volume 42 Issue 1, February 2014
Publisher: Kluwer Academic Publishers
Bibliometrics:
Citation Count: 0

This paper analyzes the sources of performance losses in hardware transactional memory and investigates techniques to reduce the losses. It dissects the root causes of data conflicts in hardware transactional memory systems (HTM) into four classes of conflicts: true sharing, false sharing, silent store, and write-write conflicts. These conflicts can ...
Keywords: Contamination misses, Transactional memory, Intermediate checkpointing, Manycore

10
A Case for a Value-Aware Cache
Angelos Arelakis, Per Stenstrom
January 2014 IEEE Computer Architecture Letters: Volume 13 Issue 1, January 2014
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

Replication of values causes poor utilization of on-chip cache memory resources. This paper addresses the question: How much cache resources can be theoretically and practically saved if value replication is eliminated? We introduce the concept of value-aware caches and show that a sixteen times smaller value-aware cache can yield the ...

11
Eager Beats Lazy: Improving Store Management in Eager Hardware Transactional Memory
Ruben Titos-Gil, Anurag Negi, Manuel E. Acacio, Jose M. Garcia, Per Stenstrom
November 2013 IEEE Transactions on Parallel and Distributed Systems: Volume 24 Issue 11, November 2013
Publisher: IEEE Press
Bibliometrics:
Citation Count: 3

Hardware transactional memory (HTM) designs are very sensitive to the manner in which speculative updates from transactions are handled in the system. This study highlights how the lack of effective techniques for store management results in a quick degradation in the performance of eager HTM systems with increasing contention and, ...
Keywords: Buffer storage,Hardware,Concurrent computing,Parallel processing,Optimization,Coherence,Scalability,transactional memory,Parallel programming,multicore architectures

12
Towards automatic resource management in parallel architectures
Per Stenström
October 2013 PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 4,   Downloads (Overall): 61

Full text available: PDFPDF
As we have embarked on the multi/many-core roadmap, resource management, especially managing parallelism, is left in the hands of programmers. A major challenge moving forward is how to off-load programmers from the daunting task of managing hardware resources in future parallel architectures to meet higher demands on performance and power ...
Keywords: multi/many-core, parallel programming

13 published by ACM
Moving from petaflops to petadata
Michael J. Flynn, Oskar Mencer, Veljko Milutinovic, Goran Rakocevic, Per Stenstrom, Roman Trobec, Mateo Valero
May 2013 Communications of the ACM: Volume 56 Issue 5, May 2013
Publisher: ACM
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 6,   Downloads (12 Months): 68,   Downloads (Overall): 1,292

Full text available: HtmlHtml  PDFPDF  PDF Chinese translationPDF Chinese translation
The race to build ever-faster supercomputers is on, with more contenders than ever before. However, the current goals set for this race may not lead to the fastest computation for particular applications.

14
Improving data access efficiency by using a tagless access buffer (TAB)
February 2013 CGO '13: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 6,   Downloads (12 Months): 12,   Downloads (Overall): 22

Full text available: PDFPDF
The need for energy efficiency continues to grow for many classes of processors, including those for which performance remains vital. Data cache is crucial for good performance, but it also represents a significant portion of the processor's energy expenditure. We describe the implementation and use of a tagless access buffer ...
Keywords: strided access,energy,memory hierarchy

15
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications
Guancheng Chen, Per Stenstrom
November 2012 SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Computer Society Press
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 6,   Downloads (12 Months): 29,   Downloads (Overall): 345

Full text available: PDFPDF
Critical sections are well known potential performance bottlenecks in multithreaded applications and identifying the ones that inhibit scalability are important for performance optimizations. While previous approaches use idle time as a key measure, we show such a measure is not reliable. The reason is that idleness does not necessarily mean ...
Keywords: multicore, multithreading, critical path, critical section, performance analysis

16 published by ACM
Transactional prefetching: narrowing the window of contention in hardware transactional memory
September 2012 PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2,   Downloads (12 Months): 8,   Downloads (Overall): 160

Full text available: PDFPDF
Memory access latency is the primary performance bottleneck in modern computer systems. Prefetching data before it is needed by a processing core allows substantial performance gains by overlapping significant portions of memory latency with useful work. Prior work has investigated this technique and measured potential benefits in a variety of ...
Keywords: multicores, prefetching, hardware transactional memory

17
p-TM: Pessimistic invalidation for scalable lazy hardware transactional memory
Anurag Negi, Ruben Titos-Gil, Manuel E. Acacio, Jose M. Garcia, Per Stenstrom
February 2012 HPCA '12: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 4

Lazy hardware transactional memory has been shown to be more efficient at extracting available concurrency than its eager counterpart. However, it poses scalability challenges at commit time as existence of conflicts among concurrent transactions is not known prior to commit. Non-conflicting transactions may have to wait before committing, severely affecting ...

18 published by ACM
Introduction to the special issue on high-performance and embedded architectures and compilers
Per Stenström, Koen De Bosschere
January 2012 ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers: Volume 8 Issue 4, January 2012
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 13,   Downloads (Overall): 403

Full text available: PDFPDF

19
Classification and Elimination of Conflicts in Hardware Transactional Memory Systems
Mridha-Mohammad Waliullah, Per Stenstrom
October 2011 SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

This paper analyzes the sources of performance losses in hardware transactional memory and investigates techniques to reduce the losses. It dissects the root causes of data conflicts in hardware transactional memory systems (HTM) into four classes of conflicts: true sharing, false sharing, silent store, and write-write conflicts. These conflicts can ...
Keywords: Transactional Memory, Contamination Misses, Intermediate Checkpointing, Manycore

20
Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory
Anurag Negi, Per Stenstrom, Ruben Titos-Gil, Manuel E. Acacio, Jose M. Garcia
October 2011 PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

Lazy hardware transactional memory (HTM) al-lows better utilization of available concurrency in transactional workloads than eager HTM, but poses challenges at commit time due to the requirement of en-masse publication of speculative updates to global system state. Early conflictdetection can be employed in lazy HTM designs to allow non-conflicting transactions ...
Keywords: multicores, concurrency, transactional memory



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us