publications
2022
- Work-stealing prefix scan: Addressing load imbalance in large-scale image registrationMarcin Copik, Tobias Grosser, Torsten Hoefler, and 2 more authorsIEEE Transactions on Parallel and Distributed Systems 2022
Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.
@periodical{2020prefixsum, title = {Work-stealing prefix scan: Addressing load imbalance in large-scale image registration}, author = {Copik, Marcin and Grosser, Tobias and Hoefler, Torsten and Bientinesi, Paolo and Berkels, Benjamin}, year = {2022}, type = {article}, journal = {IEEE Transactions on Parallel and Distributed Systems}, url = {https://ieeexplore.ieee.org/document/9477174}, volume = {33}, number = {3}, pages = {523-535}, doi = {10.1109/TPDS.2021.3095230}, }
- Performance-Detective: Automatic Deduction of Cheap and Accurate Performance ModelsLarissa Schmid, Marcin Copik, Alexandru Calotoiu, and 5 more authors2022
The many configuration options of modern applications make it difficult for users to select a performance-optimal configuration. Performance models help users in understanding system performance and choosing a fast configuration. Existing performance modeling approaches for applications and configurable systems either require a full-factorial experiment design or a sampling design based on heuristics. This results in high costs for achieving accurate models. Furthermore, they require repeated execution of experiments to account for measurement noise. We propose Performance-Detective, a novel code analysis tool that deduces insights on the interactions of program parameters. We use the insights to derive the smallest necessary experiment design and avoiding repetitions of measurements when possible, significantly lowering the cost of performance modeling. We evaluate Performance-Detective using two case studies where we reduce the number of measurements from up to 3125 to only 25, decreasing cost to only 2.9% of the previously needed core hours, while maintaining accuracy of the resulting model with 91.5% compared to 93.8% using all 3125 measurements.
@inproceedings{schmid2022perfdetective, author = {Schmid, Larissa and Copik, Marcin and Calotoiu, Alexandru and Werle, Dominik and Reiter, Andreas and Selzer, Michael and Koziolek, Anne and Hoefler, Torsten}, title = {Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models}, year = {2022}, isbn = {9781450392815}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3524059.3532391}, doi = {10.1145/3524059.3532391}, booktitle = {Proceedings of the 36th ACM International Conference on Supercomputing}, articleno = {3}, numpages = {13}, keywords = {empirical performance modeling, automatic performance modeling, configurable systems, experiment design}, location = {Virtual Event}, series = {ICS '22}, }
- FaasKeeper: a Blueprint for Serverless ServicesMarcin Copik, Alexandru Calotoiu, Konstantin Taranov, and 1 more author2022
FaaS (Function-as-a-Service) brought a fundamental shift into cloud computing: (persistent) virtual machines have been replaced with dynamically allocated resources, trading locality and statefulness for a pay-as-you-go model more suitable for varying and infrequent workloads. However, adapting services to function within the serverless paradigm while still fulfilling requirements is challenging. In this work, we introduce a design blueprint for creating complex serverless services and contribute a set of requirements for efficient and scalable FaaS computing. To showcase our approach, we focus on ZooKeeper, a centralized coordination service that offers a safe and wait-free consensus mechanism but requires a persistent allocation of computing resources that does not offer the flexibility needed to handle variable workloads. We design FaaSKeeper, the first coordination service built on serverless functions and cloud-native services. FaaSKeeper provides the same consistency guarantees and interface as ZooKeeper with a price model proportional to the activity in the system. In addition, we define synchronization primitives to extend the capabilities of scalable cloud storage ser- vices with consensus semantics needed for strong data consistency.
@misc{copik2022faaskeeper, author = {Copik, Marcin and Calotoiu, Alexandru and Taranov, Konstantin and Hoefler, Torsten}, title = {FaasKeeper: a Blueprint for Serverless Services}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license}, }
- FMI: Fast and Cheap Message Passing for Serverless FunctionsMarcin Copik, Roman Böhringer, Alexandru Calotoiu, and 1 more author2022
Serverless functions provide elastic scaling with a fine-grained billing model. However, for distributed jobs that benefit from large-scale and dynamic parallelism, the lack of fast and cheap communication is a major limitation of FaaS computing. Individual functions do not communicate directly and group operations do not exist, and users resort to manual implementations of storage-based communication. We overcome this limitation and present the FaaS Message Interface (FMI). FMI is an easy-to-use, high-performance framework for general-purpose point-to-point and collective communication. We support different communication channels and offer a model-driven channel selection according to performance and cost expectations. We model the interface after MPI and show message passing can be integrated into serverless applications with minor changes. In our experiments, FMI can speed up communication for a distributed machine learning job by up to 162x while at the same time reducing cost by up to 941 times.
@misc{copik2022fmi, author = {Copik, Marcin and Böhringer, Roman and Calotoiu, Alexandru and Hoefler, Torsten}, title = {FMI: Fast and Cheap Message Passing for Serverless Functions}, year = {2022}, }
- Software Resource Disaggregation for HPC with Serverless ComputingMarcin Copik, Marcin Chrapek, Alexandru Calotoiu, and 1 more author2022
The aggregated HPC resources with rigid allocation systems and programming models struggle to adapt to diverse and changing workloads. Thus, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of supercomputing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this paper, we use the new cloud paradigm of serverless computing to improve the utilization of supercom- puters. We show that the FaaS programming model satisfies the requirements of high-performance applications and how idle memory helps resolve cold startup issues. We demonstrate a software resource disaggregation approach where the co-location of functions allows to utilize idle cores and accelerators while retaining near-native performance.
@misc{copik2022disagg, author = {Copik, Marcin and Chrapek, Marcin and Calotoiu, Alexandru and Hoefler, Torsten}, title = {Software Resource Disaggregation for HPC with Serverless Computing}, year = {2022}, }
- Software Resource Disaggregation for HPC with Serverless ComputingMarcin Copik, Alexandru Calotoiu, and Torsten HoeflerACM Student Research Competition at ACM/IEEE Supercomputing 2022
Aggregated HPC resources have rigid allocation systems and programming models and struggle to adapt to diverse and changing workloads. Thus, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of supercomputing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this project, we use the new cloud paradigm of serverless computing to improve the utilization of supercomputers. We show that the FaaS programming model satisfies the requirements of high-performance applications and how idle memory helps resolve cold startup issues. We demonstrate a software resource disaggregation approach where the co-location of functions allows idle cores and accelerators to be utilized while retaining near-native performance.
- MOM: Matrix Operations in MLIRLorenzo Chelini, Henrik Barthels, Paolo Bientinesi, and 3 more authors12th International Workshop on Polyhedral Compilation Techniques 2022
Modern research in code generators for dense linear algebra computations has shown the ability to produce optimized code with a performance which compares and often exceeds the one of state-of-the-art implementations by domain experts. However, the underlying infrastructure is often developed in isolation making the interconnection of logically combinable systems complicated if not impossible. In this paper, we propose to leverage MLIR as a unifying compiler infrastructure for the optimization of dense linear algebra operations. We propose a new MLIR dialect for expressing linear algebraic computations including matrix properties to enable high-level algorithmic transformations. The integration of this new dialect in MLIR enables end-to-end compilation of matrix computations via conversion to existing lower-level dialects already provided by the framework.
@article{chelini2022mom, title = {MOM: Matrix Operations in MLIR}, author = {Chelini, Lorenzo and Barthels, Henrik and Bientinesi, Paolo and Copik, Marcin and Grosser, Tobias and Spampinato, Daniele G}, year = {2022}, journal = {12th International Workshop on Polyhedral Compilation Techniques}, series = {IMPACT 2022}, }
2021
- rFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance ComputingMarcin Copik, Konstantin Taranov, Alexandru Calotoiu, and 1 more author2021
The rigid MPI programming model and batch scheduling dominate high-performance computing. While clouds brought new levels of elasticity into the world of computing, supercomputers still suffer from low resource utilization rates. To enhance supercomputing clusters with the benefits of serverless computing, a modern cloud programming paradigm for pay-as-you-go execution of stateless functions, we present rFaaS, the first RDMA-aware Function-as-a-Service (FaaS) platform. With hot invocations and decentralized function placement, we overcome the major performance limitations of FaaS systems and provide low-latency remote invocations in multi-tenant environments. We evaluate the new serverless system through a series of microbenchmarks and show that remote functions execute with negligible performance overheads. We demonstrate how serverless computing can bring elastic resource management into MPI-based high-performance applications. Overall, our results show that MPI applications can benefit from modern cloud programming paradigms to guarantee high performance at lower resource costs.
@misc{2021rfaas, title = {{r}FaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing}, author = {Copik, Marcin and Taranov, Konstantin and Calotoiu, Alexandru and Hoefler, Torsten}, year = {2021}, eprint = {2106.13859}, archiveprefix = {arXiv}, primaryclass = {cs.DC}, }
- SeBS: A Serverless Benchmark Suite for Function-as-a-Service ComputingMarcin Copik, Grzegorz Kwasniewski, Maciej Besta, and 2 more authors2021
Function-as-a-Service (FaaS) is one of the most promising directions for the future of cloud services, and serverless functions have immediately become a new middleware for building scalable and cost-efficient microservices and applications. However, the quickly moving technology hinders reproducibility, and the lack of a standardized benchmarking suite leads to ad-hoc solutions and microbenchmarks being used in serverless research, further complicating metaanalysis and comparison of research solutions. To address this challenge, we propose the Serverless Benchmark Suite: the first benchmark for FaaS computing that systematically covers a wide spectrum of cloud resources and applications. Our benchmark consists of the specification of representative workloads, the accompanying implementation and evaluation infrastructure, and the evaluation methodology that facilitates reproducibility and enables interpretability. We demonstrate that the abstract model of a FaaS execution environment ensures the applicability of our benchmark to multiple commercial providers such as AWS, Azure, and Google Cloud. Our work facilities experimental evaluation of serverless systems, and delivers a standardized, reliable and evolving evaluation methodology of performance, efficiency, scalability and reliability of middleware FaaS platforms.
@inproceedings{copik2020sebs, title = {SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing}, author = {Copik, Marcin and Kwasniewski, Grzegorz and Besta, Maciej and Podstawski, Michal and Hoefler, Torsten}, year = {2021}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3464298.3476133}, pages = {64–78}, numpages = {15}, keywords = {function-as-a-service, benchmark, serverless, FaaS}, location = {Qu\'{e}bec city, Canada}, booktitle = {Proceedings of the 22nd International Middleware Conference}, series = {Middleware '21}, doi = {10.1145/3464298.3476133}, }
- Extracting Clean Performance Models from Tainted ProgramsMarcin Copik, Alexandru Calotoiu, Tobias Grosser, and 3 more authors2021
Performance models are well-known instruments to understand the scaling behavior of parallel applications. They express how performance changes as key execution parameters, such as the number of processes or the size of the input problem, vary. Besides reasoning about program behavior, such models can also be automatically derived from performance data. This is called empirical performance modeling. While this sounds simple at the first glance, this approach faces several serious interrelated challenges, including expensive performance measurements, inaccuracies inflicted by noisy benchmark data, and overall complex experiment design, starting with the selection of the right parameters. The more parameters one considers, the more experiments are needed and the stronger the impact of noise. In this paper, we show how taint analysis, a technique borrowed from the domain of computer security, can substantially improve the modeling process, lowering its cost, improving model quality, and help validate performance models and experimental setups.
@inproceedings{2021perftaint, author = {Copik, Marcin and Calotoiu, Alexandru and Grosser, Tobias and Wicki, Nicolas and Wolf, Felix and Hoefler, Torsten}, title = {Extracting Clean Performance Models from Tainted Programs}, year = {2021}, isbn = {9781450382946}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3437801.3441613}, doi = {10.1145/3437801.3441613}, booktitle = {Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming}, pages = {403–417}, numpages = {15}, keywords = {performance modeling, high-performance computing, compiler techniques, LLVM, taint analysis}, location = {Virtual Event, Republic of Korea}, series = {PPoPP '21}, }
- GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set AlgebraMaciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, and 16 more authorsAug 2021
We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of different fine-grained elements of graph mining algorithms, such as graph representations or algorithm subroutines. The platform includes parallel implementations of more than 40 considered baselines, and it facilitates developing complex and fast mining algorithms. High modularity is possible by harnessing set algebra operations such as set intersection and difference, which enables breaking complex graph mining algorithms into simple building blocks that can be separately experimented with. GMS is supported with a broad concurrency analysis for portability in performance insights, and a novel performance metric to assess the throughput of graph mining algorithms, enabling more insightful evaluation. As use cases, we harness GMS to rapidly redesign and accelerate state-of-the-art baselines of core graph mining problems: degeneracy reordering (by up to >2x), maximal clique listing (by up to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x), also obtaining better theoretical performance bounds.
@inproceedings{besta2021graphminesuite, title = {GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra}, author = {Besta, Maciej and Vonarburg-Shmaria, Zur and Schaffner, Yannick and Schwarz, Leonardo and Kwasniewski, Grzegorz and Gianinazzi, Lukas and Beranek, Jakub and Janda, Kacper and Holenstein, Tobias and Leisinger, Sebastian and Tatkowski, Peter and Ozdemir, Esref and Balla, Adrian and Copik, Marcin and Lindenberger, Philipp and Kalvoda, Pavel and Konieczny, Marek and Mutlu, Onur and Hoefler, Torsten}, year = {2021}, eprint = {2103.03653}, archiveprefix = {arXiv}, primaryclass = {cs.DC}, month = aug, booktitle = {Proceedings of the 47th International Conference on Very Large Data Bases (VLDB'21)}, doi = {10.14778/3476249.3476252}, }
- SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory SystemsMaciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, and 14 more authorsOct 2021
Simple graph algorithms such as PageRank have recently been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are memory-bound and thus could be accelerated by hardware techniques such as Processing-in-Memory (PIM). However, they also come with non-straightforward parallelism and complicated memory access patterns. In this work, we address this with a simple yet surprisingly powerful observation: operations on sets of vertices, such as intersection or union, form a large part of many complex graph mining algorithms, and can offer rich and simple parallelism at multiple levels. This observation drives our cross-layer design, in which we (1) expose set operations using a novel programming paradigm, (2) express and execute these operations efficiently with carefully designed set-centric ISA extensions called SISA, and (3) use PIM to accelerate SISA instructions. The key design idea is to alleviate the bandwidth needs of SISA instructions by mapping set operations to two types of PIM: in-DRAM bulk bitwise computing for bitvectors representing high-degree vertices, and near-memory logic layers for integer arrays representing low-degree vertices. Set-centric SISA-enhanced algorithms are efficient and outperform hand-tuned baselines, offering more than 10x speedup over the established Bron-Kerbosch algorithm for listing maximal cliques. We deliver more than 10 SISA set-centric algorithm formulations, illustrating SISA’s wide applicability.
@inproceedings{besta2021sisa, title = {SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems}, author = {Besta, Maciej and Kanakagiri, Raghavendra and Kwasniewski, Grzegorz and Ausavarungnirun, Rachata and Beránek, Jakub and Kanellopoulos, Konstantinos and Janda, Kacper and Vonarburg-Shmaria, Zur and Gianinazzi, Lukas and Stefan, Ioana and Luna, Juan Gómez and Copik, Marcin and Kapp-Schwoerer, Lukas and Girolamo, Salvatore Di and Konieczny, Marek and Mutlu, Onur and Hoefler, Torsten}, year = {2021}, month = oct, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3466752.3480133}, doi = {10.1145/3466752.3480133}, booktitle = {MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture}, series = {MICRO '21}, }
2020
- ExtraPeak: Advanced Automatic Performance Modeling for HPC ApplicationsAlexandru Calotoiu, Marcin Copik, Torsten Hoefler, and 3 more authorsOct 2020
Performance models are powerful tools allowing developers to understand the behavior of their applications, and empower them to address performance issues already during the design or prototyping phase. Unfortunately, the difficulties of creating such models manually and the effort involved render performance modeling a topic limited to a relatively small community of experts. This article summarizes the results of the two projects Catwalk, which aimed to create tools that automate key activities of the performance modeling process, and ExtraPeak, which built upon the results of Catwalk and worked toward making this powerful methodology more flexible, streamlined and easy to use. The sew projects both provide accessible tools and methods that bring performance modeling to a wider audience of HPC application developers. Since its outcome represents the final state of the two projects, we expand to a greater extent on the results of ExtraPeak.
@inproceedings{10.1007/978-3-030-47956-5_15, author = {Calotoiu, Alexandru and Copik, Marcin and Hoefler, Torsten and Ritter, Marcus and Shudler, Sergei and Wolf, Felix}, editor = {Bungartz, Hans-Joachim and Reiz, Severin and Uekermann, Benjamin and Neumann, Philipp and Nagel, Wolfgang E.}, title = {ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications}, booktitle = {Software for Exascale Computing - SPPEXA 2016-2019}, year = {2020}, publisher = {Springer International Publishing}, address = {Cham}, pages = {453--482}, isbn = {978-3-030-47956-5}, }
2019
- perf-taint: Taint Analysis for Automatic Many-Parameter Performance ModelingMarcin Copik, and Torsten HoeflerACM Student Research Competition at ACM/IEEE Supercomputing Oct 2019
Performance modeling is a well-known technique for understanding the scaling behavior of an application. Although the modeling process is today often automatic, it still relies on a domain expert selecting program parameters and deciding relevant sampling intervals. Since existing empirical methods attempt blackbox modeling, the decision on which parameters influence a selected part of the program is based on measured data, making empirical modeling sensitive to human errors and instrumentation noise. We introduce a hybrid analysis to mitigate the current limitations of empirical modeling, combining the confidence of static analysis with the ability of dynamic taint analysis to capture the effects of control-flow and memory operations. We construct models of computation and communication volumes that help the modeler to remove effects of noise and improve the correctness of estimated models. Our automatic analysis prunes irrelevant program parameters and brings an understanding of parameter dependencies which helps in designing the experiment.
2018
- The Generalized Matrix Chain AlgorithmHenrik Barthels, Marcin Copik, and Paolo BientinesiOct 2018
In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product M := A1 A2 ⋯ An that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 9. The motivation for this work comes from the fact that—despite great advances in the development of compilers—the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed.
@inproceedings{10.1145/3168804, author = {Barthels, Henrik and Copik, Marcin and Bientinesi, Paolo}, title = {The Generalized Matrix Chain Algorithm}, year = {2018}, isbn = {9781450356176}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3168804}, doi = {10.1145/3168804}, booktitle = {Proceedings of the 2018 International Symposium on Code Generation and Optimization}, pages = {138–148}, numpages = {11}, keywords = {matrix chain problem, linear algebra, compiler}, location = {Vienna, Austria}, series = {CGO 2018}, }
2017
- Using SYCL as an Implementation Framework for HPX.ComputeMarcin Copik, and Hartmut KaiserOct 2017
The recent advancements in High Performance Computing and ongoing research to reach Exascale has been heavily supported by introducing dedicated massively parallel accelerators. Programmers wishing to maximize utilization of current supercomputers are required to develop software which not only involves scaling across multiple nodes but are capable of offloading data-parallel computation to dedicated hardware such as graphic processors. Introduction of new types of hardware has been followed by developing new languages, extensions, compilers and libraries. Unfortunately, none of those solutions seem to be fully portable and independent from specific vendor and type of hardware.HPX.Compute, a programming model developed on top of HPX, a C++ standards library for concurrency and parallelism, uses existing and proposed C++ language and library capabilities to support various types of parallelism. It aims to provide a generic interface allowing for writing code which is portable between hardware architectures.We have implemented a new backend for HPX.Compute based on SYCL, a Khronos standard for single-source programming of OpenCL devices in C++. We present how this runtime may be used to target OpenCL devices through our C++ API. We have evaluated performance of new implementation on graphic processors with STREAM benchmark and compare results with existing CUDA-based implementation.
@inproceedings{2017sycl, author = {Copik, Marcin and Kaiser, Hartmut}, title = {Using SYCL as an Implementation Framework for HPX.Compute}, year = {2017}, isbn = {9781450352147}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3078155.3078187}, doi = {10.1145/3078155.3078187}, booktitle = {Proceedings of the 5th International Workshop on OpenCL}, articleno = {30}, numpages = {7}, keywords = {GPGPU, parallel programming, SYCL, C++, heterogeneous programming, HPX}, location = {Toronto, Canada}, series = {IWOCL 2017}, }
- Parallel Prefix Algorithms for the Registration of Arbitrarily Long Electron Micrograph SeriesMarcin Copik, Paolo Bientinesi, and Benjamin BerkelsACM Student Research Competition at ACM/IEEE Supercomputing Oct 2017
Recent advances in the technology of transmission electron microscopy have allowed for a more precise visualization of materials and physical processes, such as metal oxidation. Nevertheless, the quality of information is limited by the damage caused by an electron beam, movement of the specimen or other environmental factors. A novel registration method has been proposed to remove those limitations by acquiring a series of low dose microscopy frames and performing a computational registration on them to understand and visualize the sample. This process can be represented as a prefix sum with a complex and computationally intensive binary operator and a parallelization is necessary to enable processing long series of microscopy images. With our parallelization scheme, the time of registration of results from ten seconds of microscopy acquisition has been decreased from almost thirteen hours to less than seven minutes on 512 Intel IvyBridge cores.
- Parallel Prefix Algorithms for the Registration of Arbitrarily Long Electron Micrograph SeriesMarcin CopikMaster Thesis Oct 2017
Recent advances in the technology of transmission electron microscopy have allowed for a more precise visualization of materials and physical processes, such as metal oxidation. Nevertheless, the quality of information is limited by the damage caused by an electron beam, movement of the specimen or other environmental factors. A novel registration method has been proposed to remove those limitations by acquiring a series of low dose microscopy frames and performing a computational registration on them to understand and visualize the sample. This process can be represented as a prefix sum with a complex and computationally intensive binary operator and a parallelization is necessary to enable processing long series of microscopy images. With our parallelization scheme, the time of registration of results from ten seconds of microscopy acquisition has been decreased from almost thirteen hours to less than seven minutes on 512 Intel IvyBridge cores.
@article{2017masterthesis, title = {Parallel Prefix Algorithms for the Registration of Arbitrarily Long Electron Micrograph Series}, author = {Copik, Marcin}, year = {2017}, journal = {Master Thesis}, }
2016
- A GPGPU-based Simulator for Prism: Statistical Verification of Results of PMC (extended abstract)Marcin Copik, Artur Rataj, and Bozena Wozna-SzczesniakOct 2016
We describe a GPGPU–based Monte Carlo simulator integrated with Prism. It supports Markov chains with discrete or continuous time and a subset of properties expressible in PCTL, CSL and their variants extended with rewards. The simulator allows an automated statistical verification of results obtained using Prism’s formal methods.
@inproceedings{2016gpu, author = {Copik, Marcin and Rataj, Artur and Wozna{-}Szczesniak, Bozena}, editor = {Schlingloff, Bernd{-}Holger}, title = {A GPGPU-based Simulator for Prism: Statistical Verification of Results booktitle = {Proceedings of the 25th International Workshop on Concurrency, Specification and Programming, Rostock, Germany, September 28-30, 2016}, series = {{CEUR} Workshop Proceedings}, volume = {1698}, pages = {199--208}, publisher = {CEUR-WS.org}, year = {2016}, timestamp = {Wed, 12 Feb 2020 16:45:14 +0100}, biburl = {https://dblp.org/rec/conf/csp/CopikRW16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, }
2014
- GPU-accelerated stochastic simulator engine for PRISM model checker.Marcin CopikBachelor Thesis Oct 2014
This project provides a new simulator engine for the PRISM model checker, an enhancement allowing for faster approximate model checking. The simulator is designed as a substitute for the current engine for simple integration with GUI and CLI. The engine was implemented with the OpenCL, an open standard for massively parallel computing on heterogeneous platforms. The engine generates a proper OpenCL kernel for a PRISM model, which will execute on OpenCL devices. This approach enables the generation of samples both on CPU and GPU. The performance and correctness tests included three case studies taken from the official PRISM benchmark. The results showed a huge gain in performance over the existing simulator; in the most extreme case, the new engine, working on seven years old NVIDIA GPU, verified a test property in 20 seconds, where the existing simulator engine needed over two hours.
- Methods for abdominal respiratory motion trackingDominik Spinczyk, Adam Karwan, and Marcin CopikComputer Aided Surgery Oct 2014
Non-invasive surface registration methods have been developed to register and track breathing motions in a patient’s abdomen and thorax. We evaluated several different registration methods, including marker tracking using a stereo camera, chessboard image projection, and abdominal point clouds. Our point cloud approach was based on a time-of-flight (ToF) sensor that tracked the abdominal surface. We tested different respiratory phases using additional markers as landmarks for the extension of the non-rigid Iterative Closest Point (ICP) algorithm to improve the matching of irregular meshes. Four variants for retrieving the correspondence data were implemented and compared. Our evaluation involved 9 healthy individuals (3 females and 6 males) with point clouds captured in opposite breathing phases (i.e., inhalation and exhalation). We measured three factors: surface distance, correspondence distance, and marker error. To evaluate different methods for computing the correspondence measurements, we defined the number of correspondences for every target point and the average correspondence assignment error of the points nearest the markers.
@article{doi:10.3109/10929088.2014.891657, author = {Spinczyk, Dominik and Karwan, Adam and Copik, Marcin}, journal = {Computer Aided Surgery}, title = {Methods for abdominal respiratory motion tracking}, year = {2014}, number = {1-3}, pages = {34-47}, volume = {19}, doi = {10.3109/10929088.2014.891657}, eprint = {https://doi.org/10.3109/10929088.2014.891657}, publisher = {Taylor & Francis}, url = {https://doi.org/10.3109/10929088.2014.891657}, }