News | Google Augments AI Training with A3 Virtual Machines Backed by Nvidia’s H100 GPUs

Google Augments AI Training with A3 Virtual Machines Backed by Nvidia’s H100 GPUs

Published by: Insights Desk Released: May 12, 2023 Source: DemandTalk

Highlights:

A single A3 supercomputer VM is powered by eight H100 GPUs based on Nvidia’s Hopper architecture, delivering 3x faster processing power than the previous generation chip, the A100.
A3 VMs are not only powerful, but Google Cloud also offers some flexible deployment options.

With the debut of its A3 supercomputers, Google Cloud is expanding its portfolio of virtual machines for training and operating artificial intelligence and machine learning models.

Announced at Google I/O, the Google Compute Engine A3 supercomputer is purpose-built to train and deploy state-of-the-art AI models, including those driving advances in the company’s exciting field of generative AI’s VM.

Cutting-edge AI and machine learning require massive amounts of computing power delivered by purpose-built infrastructure, said Roy Kim, Director of Product Management, and Chris Kleban, Group Product Manager at Google. Google Cloud will use the A3 supercomputer to offer a mix of Nvidia Corp.’s new H100 graphics processing units. Kim and Kleban said the company’s unique state-of-the-art networking advancements ensure customers have access to the highest-performing GPUs for AI workloads.

A single A3 virtual machine (VM) is fueled by eight H100 GPUs based on Nvidia’s Hopper architecture, delivering three times the processing speed of the A100. It also offers a half bandwidth of 3.6 terabytes per second across these GPUs via NVSwitch and NVLink 4.0, as well as integration with Intel Corp.’s 4th generation Xeon Scalable processors for offloading management tasks.

The A3 supercomputer is the first GPU instance to utilize Google’s purpose-built Intel Infrastructure Processing Units to bypass the CPU host and accelerate data transfers from the GPU to the central processing unit. According to Google, this will increase network bandwidth by up to 10x compared to previous generation A2 VMs.

These instances also utilize the intelligent network fabric of Google’s Jupiter data center and can scale across 26,000 interconnected GPUs to deliver up to 26 exaFlops of AI performance. As a result, according to Google, A3 VMs significantly reduce the time and cost required to train large-scale machine learning models. Additionally, when moving from model training to deployment, A3 VMs deliver a 30x improvement in inference performance compared to A2 VMs.

A3 VMs are powerful, and Google Cloud offers some flexible deployment options. For example, customers can choose to deploy A3 VMs on Google Cloud’s Vertex AI platform to build machine learning models on fully managed infrastructure purpose-built for high-performance training. Vertex AI was recently updated with new generative AI capabilities to support the development of large language models better.

Alternatively, the company said that customers looking to develop their own bespoke software stacks can deploy the A3 supercomputer on Google Compute Engine or Google Kubernetes Engine. This enables teams to train and service advanced fundamental models while taking advantage of automatic scaling, workload orchestration, and automated updates.

how to streamline cloud security and embrace sase...

get to know the content cloud for federal governme...

streamline multi-cloud networking: leverage equini...

the power to adapt workday quick demo...

unify your communications...

cloud phone system buyers...

cloud phone system buyers...

2023 gartner magic quadrant for cloud erp for serv...

how to secure your content in the cloud with box f...

business value of dell vxrail hci...

migrate from centos linux to a cloud-ready operati...

ciso guide to business email compromise (bec)...

top security practices for resilient government...

7 ways to simplify your digital workspace deployme...

the definitive guide to end-user computing...

the total economic impact™ of nutanix for end us...

cloud migration 101: when to transition...

2023 global customer experience report...

nasuni migration intro video...

manufacturer moves windows file server workload to...

cloud managed services selection criteria explored...

msp success with remote monitoring and management ...

cloud computing data security: 10 best practices f...

the versatility of cloud pbx phone systems...

unleashing the power of backup as a service (baas)...

private cloud computing – a green flag for all b...

cloud workload protection platforms (cwpp) unveile...

dynamics of end-user computing unveiled...

cloud application security strengthening the digit...

understanding cloud concentration risks in modern ...

cloud automation alleviating hurdles in cloud mana...

immutable infrastructure's impact: benefits, ci/cd...

sovereign cloud: a necessity or trendy choice?...

boosting your business with network-attached stora...

decoding the essentials and future outlook of dist...

network attached storage – the building blocks f...

core elements in distributed file system architect...

finops cloud – evolving with the framework...

demystifying cloud finops: a beginner's guide...

supercloud: transforming the landscape of cloud se...

ibm announces acquisition of hashicorp inc. for us...

the potential hashicorp acquisition by ibm could b...

salesforce will not acquire informatica, a data ma...

hr software maker rippling people center funding a...

google llc’s axion cpu unit debuted in las vegas...

pigment sas secures usd 145 m for innovation in bu...

modal secures usd 25 m in funding for employee tra...

cloudflare inc. enhances full-stack management wit...

nuvei privatization agreement valued at usd 6.3 bi...

microsoft unbundles teams worldwide from office 36...

observe secures usd 115 m for its ai-driven observ...

netlify unveils ai tool for streamlined web develo...

cloudflare acquired nefeli networks debuts magic ...

accenture unveils new technology training service ...

cast ai report reveals underutilization of cloud r...

cognizant and microsoft strengthen collaboration w...

screenmeet empowers remote support agents with an ...

hitachi vantara and cisco strengthen hybrid cloud ...

cisco collaborates with samsung and microsoft for ...

chronosphere fuels observability capabilities post...

Google Augments AI Training with A3 Virtual Machines Backed by Nvidia’s H100 GPUs

Insights Desk

Related posts

IBM Announces Acquisition of HashiCorp Inc. for US...

The Potential HashiCorp Acquisition by IBM Could B...

Salesforce Will Not Acquire Informatica, A Data Ma...

HR Software Maker Rippling People Center Funding A...

Google LLC’s Axion CPU Unit Debuted In Las Vegas...

Pigment SAS Secures USD 145 M for Innovation in Bu...

Modal Secures USD 25 M in Funding for Employee Tra...

Cloudflare Inc. Enhances Full-stack Management wit...

Nuvei Privatization Agreement Valued at USD 6.3 Bi...

Microsoft Unbundles Teams Worldwide from Office 36...

Our Brands