• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Xeon Phi / Intel Xeon Phi for CUDA Programmers

Intel Xeon Phi for CUDA Programmers

April 16, 2014 by Rob Farber Leave a Comment

Both GPU and Xeon Phi coprocessors provide high degrees of parallelism that can deliver excellent application performance. For the most part, CUDA programmers with existing application code have already written their software so it can run well on Phi coprocessors. The key to performance lies in understanding the differences between these two architectures.

Author’s note: To ensure that I best represented  the important characteristics of these two architectures, I submitted this article for review to both Intel and NVIDIA. I have to say the quality of the feedback from both companies was superb and greatly appreciated! 

Originally published in Dr. Dobbs on December 17, 2012 (link)

Intel designed the 60-core Phi coprocessor (previously called “MIC” in the literature) so it can be programmed like a conventional x86 processor core while incorporating extensions such as a bidirectional ring interconnect for massive parallelism and a wide 512-bit per core vector unit to deliver high floating-point performance. While CUDA applications can run on x86 hardware, it’s important to know how architectural differences between GPUs and Intel Xeon Phi coprocessors affect performance and application design. The good news is that CUDA applications will readily map onto the Phi coprocessor’s vector-parallel architecture and run with high performance. The challenge lies in achieving the best possible performance.

[go to the full article link]

For more information, please see my GTC 2013 presentation “S3012 – Simplifying Portable Killer Apps with OpenACC and CUDA-5 Concisely and Efficiently”

  • pdf: starting at slide 4.
  • Video of Farber 2013 presentation (architecture discussion starts around 6:00)

Share this:

  • Twitter

Filed Under: Analysis, CUDA, Featured tutorial, News, Tutorials, Tutorials, Xeon Phi Tagged With: CUDA, GPU, HPC, Intel Xeon Phi, machine-learning, openacc, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • High Performance Ray Tracing With Embree On Intel Xeon Phi
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google

Archives

© 2025 · techenablement.com