Back to Blog

HIPAA Compliant LLMs in the Cloud

3170 words
15 min read
March 14, 2025

Table of Contents

HIPAA Compliant LLMs in the Cloud

In this article, we look at the top HIPAA-compliant and SOC 2 audited large language models. We'll review each service's compliance, cost, and scalability. The following paragraphs are taken in full from our research, including all links.

flowchart TD A[Healthcare Sector Needs LLM] --> B[Check HIPAA & SOC2 Compliance] B --> C[Deploy LLM in Public Cloud] C --> D[Pay as You Go - Start Small] D --> E[Scale Up Demand As Required]

Below are some of the most powerful large language model (LLM) services that support HIPAA and SOC 2 compliance by default. Each is offered on a public cloud with pay-per-use pricing (no large upfront commitment). We also note options for easy scalability (even if not the most cost-effective) and highlight providers with out-of-the-box compliance at affordable rates.

Microsoft Azure OpenAI Service (GPT-4, GPT-3.5, etc.)

Compliance: Azure OpenAI is HIPAA-compliant by default when used under Microsoft’s BAA (Business Associate Agreement). It’s an in-scope Azure service, meaning it’s covered by Microsoft’s enterprise compliance programs (Azure services meet SOC 2 Type II and other standards via Microsoft’s trust framework).

LLMs & Features: Provides OpenAI’s GPT models (GPT-4, GPT-3.5 Turbo, ChatGPT) with Azure’s enterprise-grade security, privacy, and integration. You get the power of GPT-4 (one of the most advanced models) with Azure’s controls (e.g. data not used for training, private network options). Azure OpenAI also benefits from Azure’s reliability and integration with other Azure AI services.

Pricing: Fully pay-as-you-go. For example, Azure explains that GPT-4 in Azure costs about $0.03 per 1,000 tokens for prompts and $0.06 per 1,000 tokens for outputs (8K context version) – identical to OpenAI’s own pricing. The 32K context GPT-4 is approximately $0.06/$0.12 per 1K tokens for input/output, as explained by Azure.

Scalability: High scalability via Azure’s cloud infrastructure. You can start small (a few requests) and scale up to enterprise volumes. Azure allows requesting higher throughput quotas as needed, and behind the scenes it allocates the necessary compute to handle your workload. In short, you pay-per-use and can burst to very large workloads (though costs will scale accordingly). Azure’s reliability (SLAs, regional availability) makes it easy to scale usage for production applications.

flowchart TB A[Azure GPT-4 HIPAA?] --> B[Sign BAA with Microsoft] B --> C[Use Azure's GPT-4 / GPT-3.5] C --> D[Data not used for training] D --> E[Scale as needed]

Google Cloud Vertex AI (PaLM 2 Models)

Compliance: Google Cloud’s Vertex AI is covered by Google’s HIPAA BAA and compliance programs. Google Cloud includes Vertex AI on its list of HIPAA-eligible services, so you can process PHI data when you have a BAA in place. Google Cloud services are also audited for SOC 2 and other standards (Google Cloud’s SOC 2 reports cover its infrastructure and managed services).

LLMs & Features: Offers Google’s PaLM 2 family models via Vertex AI Generative AI. This includes Text-Bison and Chat-Bison (general text generation and chat models) and even area-tuned variants (e.g. Med-PaLM 2 for medical applications, which is designed for healthcare and supports HIPAA compliance). These models are state-of-the-art from Google – while PaLM 2 is slightly less powerful at general reasoning than GPT-4, it excels in multilingual tasks and integrates natively with other Google services. Vertex AI also provides tools like data encryption, access control, and logging by default.

Pricing: Vertex AI uses a pay-per-use model priced by characters/tokens. For example, the PaLM 2 chat model (Chat-Bison) is about $0.0005 per 1,000 characters (approximately $2 per million tokens). For PaLM 2 text models (instruct models), it’s roughly double (around $0.001 per 1,000 characters). In practice, this is significantly cheaper than GPT-4’s pricing for similar tasks (on the order of 5–10x less per token). You only pay for what you use, and Google bills by the amount of text generated or processed.

Scalability: Easily scalable on Google’s infrastructure. There’s no upfront commitment required – you can make a few requests or millions. Vertex AI can automatically scale to handle large workloads. If needed, you can also leverage Google’s strong cloud tools (like scaling rules or batch processing in Vertex AI) for high-throughput jobs. In short, you can start with minimal usage and scale up on demand, making it suitable even if you eventually need to serve a high volume of requests (just note that costs will increase with usage). Google Cloud’s global infrastructure also means you can deploy in multiple regions for resilience or latency needs.

Amazon Bedrock (AWS) – Titan, Claude, etc.

Compliance: Amazon Bedrock is HIPAA-eligible and SOC compliant out-of-the-box. It’s an AWS managed service designed with security and compliance in mind. AWS includes Bedrock in its HIPAA scope (so it can be used with a BAA), as detailed on the AWS Security and Privacy page. Bedrock also adheres to AWS’s SOC 2, ISO 27001, and other certifications, as explained on the same AWS Security and Privacy page. (AWS notes that Bedrock is in scope for “common compliance standards including ISO, SOC, CSA STAR…and is HIPAA eligible”). In short, using Bedrock via your AWS account can be done in a HIPAA-compliant manner and inherits AWS’s strong security controls.

LLMs & Features: Bedrock offers a choice of top-tier foundation models behind a unified API. This includes Amazon’s own Titan LLMs for text (for tasks like writing and summarizing), Anthropic’s Claude (Claude 2 / Claude 3 family for advanced chat and reasoning), Meta’s Llama 2, AI21’s Jurassic-2 and J2-Jumbo models, and more – all as fully managed endpoints. Key features of Bedrock include not sharing your input data with the model providers (Amazon isolates your data and tunes models on a private copy), as noted on the AWS Security and Privacy page, and seamless integration with the AWS ecosystem (e.g. AWS Identity and Access Management, CloudWatch logging, and VPC endpoints for private connectivity). It’s essentially a one-stop shop for multiple LLMs under AWS’s compliance umbrella.

Pricing: Pay-per-request pricing based on input and output tokens, similar to other APIs. The cost varies by model (reflecting their size and capability). For example, Amazon’s in-house model Titan Text Lite is very inexpensive – around $0.0015 per 1,000 input tokens and $0.0002 per 1,000 output tokens. On the other hand, a more powerful model like Anthropic Claude 3.5 via Bedrock is priced about $0.003 per 1,000 input tokens and $0.015 per 1,000 output tokens, in line with Anthropic’s direct pricing. Additionally, there’s an option to use Batch processing at approximately 50% lower cost for large-scale offline jobs, as explained on the AWS Bedrock Pricing page. AWS also offers a Provisioned Throughput plan for those who prefer to commit to a certain throughput.

Scalability: Highly scalable and flexible. By default, on-demand usage will scale with your traffic – AWS manages the capacity. For easy scaling, you can simply send more requests; AWS can burst to meet demand across regions, as noted on the AWS Bedrock Pricing page. If you have very high constant load, you can opt for Provisioned Throughput to reserve capacity (ensuring low latency even at scale, though it requires committing to hourly usage). Since Bedrock runs on AWS’s cloud, it inherits the virtually unlimited scaling capability of AWS. Even if this isn’t the absolute cheapest way to run LLMs, it allows you to scale up smoothly under full managed service and compliance – without the need to manage infrastructure as you grow.

flowchart TB A[Amazon Bedrock] --> B[Multiple LLM Choices] B --> C[HIPAA & SOC2] C --> D[Pay Per Tokens] D --> E[Full AWS Integration]

OpenAI API (Direct from OpenAI)

Compliance: The OpenAI API platform itself has strong compliance, having been audited for SOC 2 Type II (covering security and confidentiality), as detailed on the OpenAI Security and Privacy page. OpenAI also supports HIPAA compliance for its enterprise customers – it will sign a BAA for HIPAA when appropriate, as described on the OpenAI Security and Privacy page. Until recently, many used Azure as a proxy for HIPAA, but as of late 2023 OpenAI announced they can accommodate BAAs, as discussed on the OpenAI Developer Community.

LLMs & Features: Using OpenAI’s API gives you access to GPT-4, GPT-3.5 Turbo, and other OpenAI models (like DALL-E for images or Whisper for speech-to-text, though here we focus on text LLMs). GPT-4 is one of the most powerful LLMs available, known for handling complex instructions, generating high-quality content, and even interpreting images (in the vision-enabled version). GPT-3.5 Turbo is a fast, less expensive model suitable for many conversational tasks. Key features of OpenAI’s service include constant model improvements and a large ecosystem of libraries and integrations. Data submitted via the API is not used to train OpenAI’s models by default, which helps with privacy.

Pricing: Pay-per-use, no minimum. OpenAI’s pricing is token-based. For example, GPT-4 (8K context) via the API costs $0.03 per 1,000 tokens (prompt) and $0.06 per 1,000 tokens (completion), as stated in the Azure Blog. This translates to $30 per million input tokens and $60 per million output tokens. GPT-3.5-Turbo is far cheaper – roughly $0.002 per 1,000 tokens (for the chat model), as noted on Hacker News, making it viable for high-volume use. There are no monthly fees; you are billed only for what you use. (OpenAI also offers ChatGPT Enterprise plans on a flat-rate per seat basis, though the API usage remains pay-as-you-go.)

Scalability: OpenAI’s infrastructure can handle substantial loads, and you can increase rate limits as needed by contacting them or through their scaled tiers. By default, new accounts have modest rate limits, but these will automatically increase with continued usage, or you can request higher limits if needed. There’s no hard cap on scaling – many companies send large volumes of traffic through OpenAI’s API in production. Essentially, you start with low commitment and scale up usage freely, with costs scaling linearly. If extremely high throughput is needed, OpenAI may work with you (or suggest using Azure OpenAI for dedicated capacity), but for most cases the pay-per-use API is sufficient and convenient.

Cohere AI Platform

Compliance: Cohere offers LLM APIs with enterprise-grade compliance. Cohere is independently audited for SOC 2 Type II (covering security, confidentiality, and availability), as described on the Cohere Security page. They also explicitly support HIPAA compliance and are ready to sign a HIPAA Business Associate Agreement (BAA) with customers who need it, as stated on the same page.

LLMs & Features: Cohere provides its own large language models, such as the Command model series for general text generation and the Embed model for text embeddings. The Command model is an instruct-tuned LLM (similar in capability to GPT-3.5) that can handle composing text, summarizing, answering questions, and more. While it may not be as powerful as GPT-4, it is competitive with many GPT-3 level tasks and continues to improve. Key features include a strict data privacy approach (customer data isn’t used to train the base models) and hosting in secure Google Cloud data centers, as mentioned on the Cohere Security page. They also offer multi-lingual support and specialized models for tasks like classification and reranking.

Pricing: Cohere’s pricing is known to be more affordable than many competitors. It charges per token for inputs and outputs. For example, the standard Command model costs $1.00 per million input tokens and $2.00 per million output tokens, as detailed on the Cohere Pricing page. They also offer a lighter variant (Command-Light) at $0.30 per million input and $0.60 per million output, as noted on the same page. Essentially, you pay only a few dollars for millions of words generated, making Cohere attractive for budget-sensitive scenarios or high-volume LLM needs. There’s also a free tier for developers with no minimum spend.

Scalability: Serverless scaling – Cohere’s platform automatically scales usage. Their infrastructure (hosted on Google Cloud) is built to handle spikes and high throughput with load balancing and redundancy, as explained on the Cohere Security page. You can start with just a few requests and ramp up to large volumes without managing servers or provisioning.

flowchart TB A[Cohere LLM] --> B[HIPAA BAA + SOC2] B --> C[Command, Embed Models] C --> D[Lower cost vs GPT-4] D --> E[Pay-per-use, no min spend]

Anthropic Claude (via API or Cloud Providers)

Compliance: Anthropic’s Claude is designed for enterprise and can be used in HIPAA-compliant workflows, as detailed on the Anthropic Claude page. Anthropic has attained SOC 2 Type II certification and HIPAA certifications for its operations. They will also sign BAAs with customers after reviewing the use case, as explained on Anthropic's BAA information. Additionally, Claude is accessible through platforms like AWS and GCP, which themselves are HIPAA-compliant environments, as mentioned on the Anthropic Claude page.

LLMs & Features: Claude 2/Claude 3 by Anthropic is an advanced LLM known for its conversational, summarization, and reasoning strengths. It’s comparable to OpenAI’s models in many tasks and sometimes excels in providing detailed explanations. A standout feature is its very large context window – Claude 2 can handle up to 100,000 tokens of context, making it ideal for processing long documents or maintaining extended conversations. Claude is also built with a focus on harmlessness and has been extensively red-teamed for safety, making it one of the most powerful LLMs available.

Pricing: Usage-based pricing similar to others. For example, Claude 3.5 “Sonnet” is priced around $3 per million input tokens and $15 per million output tokens, as indicated on the Anthropic Claude Sonnet page. Lighter versions like Claude Instant are much cheaper, costing around $0.0008 per 1K tokens input, as per the AWS Bedrock Pricing page. You can access Claude via the Anthropic API (pay as you go, starting low and scaling with volume) or through services like Amazon Bedrock.

Scalability: Very easy to scale. Anthropic’s cloud API features a simple pay-as-you-go model with auto-scaling – rate limits increase automatically with usage, as explained on the Build with Claude page. If higher throughput or enterprise features are needed, you can work with their sales team, but for most cases you simply scale as required. Additionally, since Claude is available on AWS and Google Cloud (Vertex AI), you can leverage those platforms’ scaling features. Claude’s 100k context also means you can process large amounts of data in a single request, offering efficiency gains for certain use cases.

Each of the above options provides a powerful LLM with built-in compliance measures, a pay-per-use pricing model, and the ability to scale as your needs grow. Depending on your priorities – e.g., raw model strength (GPT-4 or Claude), cost-effectiveness (Cohere or Vertex PaLM), or multi-model flexibility (AWS Bedrock) – you can choose the provider that best fits. All these providers offer out-of-the-box SOC 2 and HIPAA support, enabling adoption in regulated or security-conscious environments while paying only for what you use.</p>

flowchart TB A[Need HIPAA-Compliant LLM] --> B[Pick Azure, Google, AWS, OpenAI, Cohere, or Claude] B --> C[All are SOC2 & HIPAA Ready] C --> D[No large commitments] D --> E[Expand usage as needed]

This covers the main research text about HIPAA-compliant and SOC 2-audited LLMs. Next, let's answer a few FAQs.

Frequently Asked Questions

1. Do all these providers sign HIPAA BAAs by default?

They typically do if your account is on the right plan. Microsoft, Google, AWS, OpenAI, Cohere, and Anthropic all have BAA programs or ways to sign a BAA. Confirm with each provider’s doc.

2. Is GPT-4 the best LLM for healthcare data?

GPT-4 is top tier for reasoning. But alternatives like Med-PaLM 2 or Claude 3 can also excel. Each has different strengths, so test them for your use case.

3. Are these solutions also compliant with ISO or FedRAMP?

Yes, many are also ISO 27001, FedRAMP, or GDPR compliant. Check the official docs for each provider to confirm.

4. How is pricing calculated if I have large text inputs?

Pricing is mostly token-based. If your text is large, your tokens used go up, leading to a bigger bill. Monitor usage carefully.

5. Which LLM is the most cost-effective for large text volumes?

Cohere is often cheaper, with low rates per million tokens. Google’s PaLM 2 chat model is also cheaper than GPT-4. Titan Lite is cheap on AWS. Compare exact rates to see what’s best.

6. How do I make sure my PHI data is protected during inference?

Make sure you have a BAA in place, enable encryption, and avoid caching PHI in logs. Each platform has secure config options you should enable.

7. Can I combine multiple LLMs across providers?

Yes, you can. Some use a multi-cloud strategy. If budgets allow, you can pick specialized models from each. Just maintain compliance in each environment.

Created on March 14, 2025

Keywords

hipaa soc2 large language models cloud llm security compliance azure openai google vertex ai amazon bedrock openai cohere anthropic claude

About The Author

Ayodesk Team of Writers

Ayodesk Team of Writers

Experinced team of writers and marketers at Ayodesk