More and more businesses are hoping to leverage Artificial Intelligence (AI) to increase revenue, improve efficiency, and drive product innovation. It is particularly noteworthy that AI use cases based on Deep Learning (DL) technology can bring effective and practical insights; some of these use cases can advance progress in various industries, such as:
Image Classification
Can be used for concept allocation, such as facial emotion classification.
Object Detection
Can be used for object localization in autonomous driving technology.
Image Segmentation
Can be used to outline organ contours in a patient's magnetic resonance imaging (MRI). Natural language processing: Can be used for text analysis or translation.
Natural Language Processing
Can be used for text analysis or translation.
Recommendation Systems
Can be used to predict customer preferences in online stores or recommend higher-value products or services.
These use cases are just the beginning. As businesses integrate AI into their operations, they will discover new ways to apply artificial intelligence. However, the commercial value of all AI use cases depends on the inference speed of models trained by deep neural networks. The resource scale required to support inference on deep learning models may be very large, often requiring businesses to upgrade hardware to achieve the performance and speed they need. However, many customers prefer to extend their existing infrastructure rather than purchase new hardware for a single purpose. Your IT department is already very familiar with Intel® hardware architecture, whose flexible performance makes your IT investments more efficient. Intel® Select Solutions for AI Inference is a “one-stop” platform that provides pre-configured, optimized, and validated solutions, enabling low-latency, high-throughput inference on CPUs without the need for additional accelerator cards.
Intel® Select Solutions for AI Inference
Intel® Select Solutions for AI Inference can help you get started quickly, leveraging solutions based on validated Intel® architecture to deploy efficient AI inference algorithms, thereby accelerating innovation and product launches. To speed up the inference and launch of AI applications, Intel® Select Solutions for AI Inference combine various Intel and third-party software and hardware technologies.
Software Selection
The software used in Intel® Select Solutions for AI Inference includes developer tools and management tools to assist with AI inference in production environments.
Intel® Distribution of OpenVINO™ Toolkit
The Intel® Distribution of OpenVINO™ Toolkit, also known as the Intel® Distribution of Open Visual Inference and Neural Network Optimization Toolkit, is a developer suite designed to accelerate the deployment of high-performance artificial intelligence and deep learning inference. This toolkit optimizes models trained on various frameworks for multiple Intel® hardware options to provide outstanding performance deployment. The Deep Learning Workbench within the toolkit quantizes models to lower precision, converting models that typically use larger 32-bit floating-point numbers (commonly used for training and consuming more memory) to 8-bit integers to optimize memory usage and performance. Converting floating-point numbers to integers significantly improves AI inference speed while maintaining almost the same accuracy. The toolkit can convert and execute models built in various frameworks, including TensorFlow, MXNet, PyTorch, Kaldi, and any framework supported by the Open Neural Network Exchange (ONNX) ecosystem. Additionally, users can access pretrained public models, speeding up development on Intel® processors and optimizing image processing pipelines without the need to search for or train models themselves.
Deep Learning Reference Stack
Intel® Select Solutions for AI Inference come with the Deep Learning Reference Stack (DLRS), an integrated high-performance open-source software stack optimized for Intel® Xeon® Scalable processors and packaged within a convenient Docker container. DLRS is pre-validated and well-configured, containing the necessary libraries and software components, thereby reducing the complexity of integrating multiple software components for AI in production environments. The stack also includes highly-tuned containers for mainstream deep learning frameworks like TensorFlow and PyTorch, as well as the Intel® Distribution of OpenVINO™ Toolkit. This open-source community version ensures that AI developers have easy access to all the features and capabilities of Intel® platforms.
Kubeflow and Seldon Core
As enterprises and organizations accumulate experience deploying inference models in production environments, industry consensus on a set of best practices, known as “MLOps,” similar to “DevOps” software development practices, has gradually emerged. To assist teams in applying MLOps, Intel® Select Solutions for AI Inference utilize Kubeflow. With Kubeflow, teams can smoothly roll out new versions of models with zero downtime. Kubeflow exports trained models to Kubernetes using supported model-serving backends (such as TensorFlow Serving). Model deployment can then utilize canary testing or shadow deployment for parallel verification of new and old versions. In case of issues, besides tracing, teams can use model and data versioning to simplify root cause analysis.
To maintain responsive services as demand increases, Intel® Select Solutions for AI Inference provide load balancing capabilities, automatically partitioning inference across nodes to available instances that can serve requests. Multi-tenancy support allows for different models, increasing hardware utilization. Lastly, to expedite processing of inference requests between servers running AI inference and endpoints needing AI insights, Intel® Select Solutions for AI Inference can use Seldon Core to help manage inference pipelines. Kubeflow integrates with Seldon Core to deploy deep learning models on Kubernetes and use the Kubernetes API to manage containers deployed in inference pipelines.
Hardware Selection
Intel® Select Solutions for AI Inference combine second-generation Intel® Xeon® Scalable processors, Intel® Optane™ Solid State Drives (SSDs), Intel® 3D NAND SSDs, and Intel® Ethernet 700 Series, allowing your enterprise to rapidly deploy production-grade AI infrastructure on a performance-optimized platform, providing large memory capacities for demanding applications and workloads.
Second-generation Intel® Xeon® Scalable Processors
Intel® Select Solutions for AI Inference feature the performance and capabilities of second-generation Intel® Xeon® Scalable processors. For “baseline” configurations, the Intel® Xeon® Gold 6248 processor achieves an excellent balance between price, performance, and integrated technologies, enhancing inference performance and efficiency for AI models. The “enhanced” configuration utilizes the Intel® Xeon® Platinum 8268 processor, designed specifically to achieve faster AI inference. Additionally, higher-tier processors are also available in either configuration. The second-generation Intel® Xeon® Scalable processors include Intel® Deep Learning Boost technology, a suite of acceleration features that improve AI inference performance through specialized Vector Neural Network Instructions (VNNI). This instruction set enables deep learning calculations that previously required three separate instructions to be completed with a single instruction.
Intel® Optane™ Technology
Intel® Optane™ Technology bridges the critical gap between storage and memory layers, enabling data centers to access data faster. This technology disrupts the memory and storage layers, providing persistent memory, large memory pools, high-speed caches, and storage across a variety of different products and solutions.
State Drives
When the cache layer runs on high-speed solid-state drives (SSDs) with low latency and high durability, AI inference can fully leverage its performance. Deploying high-performance SSDs for the cache layer, rather than mainstream Serial ATA (SATA) SSDs, offers significant benefits for high-performance workloads. In Intel® Select Solutions, the cache layer utilizes Intel® Optane™ Solid State Drives. Intel® Optane™ Solid State Drives provide high input/output operations per second (IOPS) at a competitive cost, with low latency and high durability, along with up to 30 drive writes per day (DWPD), making them an ideal choice for write-intensive caching functions. The capacity layer utilizes Intel® 3D NAND Solid State Drives, which offer outstanding read performance, along with data integrity, performance consistency, and drive reliability.
25 Gb Ethernet
The 25 Gb Intel® Ethernet 700 Series network adapters enhance the performance of Intel® Select Solutions for AI Inference. Compared to using 1 Gb Ethernet (GbE) adapters and Intel® SSD DC S4500, using 25 Gb Ethernet adapters with second-generation Intel® Xeon® Platinum processors and Intel® SSD DC P4600 can provide up to 2.5 times higher performance. The Intel® Ethernet 700 Series offers validated performance, with extensive interoperability to meet high-quality thresholds for data resilience and service reliability. All Intel® Ethernet products provide global pre-sales and post-sales support, along with limited warranty coverage throughout the product lifecycle.
Performance Verified by Benchmark Testing
All Intel® Select Solutions undergo benchmark testing to meet pre-defined levels of functionality optimized for specific workloads. AI inference is becoming an integral part of various workloads in data centers, network edges, and clouds. Therefore, Intel chooses to utilize standard deep learning benchmarking methods and simulate real-world scenarios for measurement and benchmarking.
In standard benchmark testing, the number of images processed per second (throughput) is measured on a pre-trained deep residual neural network (ResNet 50 v1). This neural network is closely related to deep learning use cases such as image classification, localization, and detection widely used with TensorFlow, PyTorch, and the OpenVINO™ toolkit using synthetic data.
To simulate real-world scenarios, multiple clients are initiated to simulate multiple request streams. These clients send images from external client systems to the server for inference. On the server side, inbound requests are load balanced by Istio. The requests are then sent to multiple instances of a serviceable object containing a pipeline of preprocessing, prediction, and post-processing steps run through Seldon Core. Inference is completed using the optimized DLRS container image Model Server from the OpenVINO™ toolkit. After passing through the pipeline, the inference results are returned to the requesting client. Throughput and latency measured during this process help ensure that this test configuration is sufficient to support inference scale in production environments.
Baseline Configuration and Enhanced Configuration
We present two reference configurations (“baseline configuration” and “enhanced configuration”) to showcase Intel® Select Solutions for AI Inference. Both configurations are validated and offer excellent performance. These configurations are specially designed and pre-tested to provide outstanding value, performance, security, and user experience. Ultimately, end customers can collaborate with system builders, system integrators, or solution and service providers to customize these configurations based on the needs and budgets of their enterprises and organizations.
The “baseline configuration” offers excellent value for money and is optimized for AI inference workloads. The “enhanced configuration” utilizes higher-tier models of Intel® Xeon® Scalable processors than the “baseline configuration” and doubles the memory. Table 1 provides detailed information on these two configurations.
The basic configuration and enhanced configuration of the Intel® Select Solutions Version 2 for AI inference
Configuration Item | Basic Configuration | Enhanced Configuration |
Processor | 2 x Intel® Xeon® Gold 6248 Processor, 2.5 GHz, 20 Cores, 40 Threads (or higher configuration) | 2 x Intel® Xeon® Platinum 8268 Processor, 2.90 GHz, 24 Cores, 48 Threads (or higher configuration) |
Memory | 192 GB or more (12 x 16 GB 2,666 MHz DDR4 ECC RDIMM) | 384 GB (12 x 32 GB 2,934 MHz DDR4 ECC RDIMM) |
Storage (Boot Disk) | 1 x 256 GB Intel® SSD DC P4101 (M.2 80 mm PCIe 3.0 x4 NVMe) | 1 x 256 GB Intel® SSD DC P4101 (M.2 80 mm PCIe 3.0 x4 NVMe) |
Storage (Cache) | 1 x 375 GB Intel® Optane™ SSD DC P4800X, featuring Intel® Memory Drive Technology | 1 x 375 GB Intel® Optane™ SSD DC P4800X, featuring Intel® Memory Drive Technology |
Storage (Capacity) | 1 x 2.0 TB Intel® SSD DC P4510 (2.5-inch PCIe Intel® Optane™ SSD) | 1 x 2.0 TB Intel® SSD DC P4510 (2.5-inch PCIe Intel® Optane™ SSD) |
Data Network | 1 x Dual-port 25/10/1 GbE Intel® Ethernet Network Adapter XXV710-DA2 (or higher model) | 1 x Dual-port 25/10/1 GbE Intel® Ethernet CNA XXV710-DA2 SFP28 (or higher model) |
Software | [Not specified] | [Not specified] |
CentOS | 7.6.1810 | 7.6.1810 |
Kernel | 3.10.0-957.el7.86_64 | 3.10.0-957.el7.86_64 |
Intel® Distribution of OpenVINO™ Toolkit | 2021.2 | 2021.2 |
OpenVINO™ Model Server | 2019.3 | 2019.3 |
TensorFlow | 2.4.0 | 2.4.0 |
PyTorch | 1.8.0 | 1.8.0 |
MXNet | 1.3.1 | 1.3.1 |
Intel® Distribution Python | 2019 Update 1 | 2019 Update 1 |
Intel® Math Kernel Library for Deep Neural Networks (MKL-DNN) | 2019.3 (implied by OpenVINO) | 2019.3 (implied by OpenVINO) |
Deep Learning Reference Stack (DLRS) | v0.5.1 | v0.5.1 |
Source to Image | 1.2.0 | 1.2.0 |
Docker | 18.09 | 18.09 |
Kubernetes | v1.15.1 | v1.15.1 |
Kubeflow | 1.0.1 | 1.0.1 |
Helm | 3.2 | 3.2 |
Seldon Core | 1.0.1 | 1.0.1 |
Ceph | v14.2.7 | v14.2.7 |
Min.io (Rook v1.0) | 1.2.7 | 1.2.7 |
Rook | 1.2.7 | 1.2.7 |
The technical choices of Intel® Select Solutions for AI inference include not only a robust Intel® hardware foundation but also other Intel® technologies that further enhance performance and reliability:
- Intel® Advanced Vector Extensions 512 (Intel® AVX-512): A 512-bit instruction set that boosts performance for demanding workloads and use cases such as AI inference.
- Intel® Deep Learning Acceleration: A suite of acceleration features introduced with the second-generation Intel® Xeon® Scalable processors, significantly improving performance for inference applications built using advanced deep learning frameworks like PyTorch, TensorFlow, MXNet, PaddlePaddle, and Caffe. The foundation of Intel® Deep Learning Acceleration technology is VNNI, a specialized instruction set that performs deep learning calculations with a single instruction, replacing three separate instructions previously required.
- Intel® Distribution of OpenVINO™ Toolkit: A free software suite that helps developers and data scientists accelerate AI workloads and simplify deep learning inference and deployment from the network edge to the cloud.
- Intel® Math Kernel Library (Intel® MKL): This library contains optimized implementations of mainstream mathematical operations for Intel® hardware, enabling applications to fully utilize the Intel® AVX-512 instruction set. It is widely compatible with various compilers, languages, operating systems, linking, and threading models.
- Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN): An open-source performance-enhanced library used to accelerate deep learning frameworks on Intel® hardware.
- Intel® Distribution for Python: Accelerates AI-related Python libraries (such as NumPy, SciPy, and scikit-learn) using integrated Intel® performance libraries (such as Intel® MKL), thus improving AI inference speed.
- Framework optimization: Intel collaborates with Google, Apache, and Baidu for TensorFlow, MXNet, and PaddlePaddle platforms, respectively, and actively develops technologies related to Caffe and PyTorch. Software optimizations tailored for Intel® Xeon® Scalable processors within data centers are employed to enhance deep learning performance, with ongoing efforts to incorporate frameworks from other industry leaders.
Intel® Xeon® Scalable Processor
The second-generation Intel® Xeon® Scalable Processor:
- Provides scalability in an economical, efficient, and flexible manner, spanning from multi-cloud environments to intelligent edge.
- Establishes a seamless performance foundation, helping accelerate the transformative impact of data.
- Supports groundbreaking Intel® Optane™ Persistent Memory technology.
- Enhances AI performance and helps the entire data center become AI-ready.
- Offers hardware-enhanced platform protection and threat monitoring.
Deploys optimized high-speed AI inference on industry-standard hardware.
The workload-optimized configurations provided by Intel® Select Solutions are validated for Intel® Xeon® Scalable Processors, serving as a shortcut to achieving data center transformation. By selecting Intel® Select Solutions for AI inference, enterprises and organizations can access pre-tuned, tested, and real-world-tested configurations that support scalable optimization. This allows IT departments to deploy AI inference quickly and efficiently in production environments. Additionally, selecting Intel® Select Solutions for AI inference enables IT departments to achieve high-speed AI inference on hardware they are familiar with and accustomed to deploying and managing.