exo-cuda-inference

Artificial Intelligence/Machine Learning Freemium View on RapidAPI ↗

Exo CUDA Inference API provides distributed GPU-accelerated inference for AI models using NVIDIA CUDA. ## Features - OpenAI-compatible API (use OpenAI SDK directly) - Support for Llama 3, Mistral, Stable Diffusion XL, Whisper, and more - NVIDIA GPU acceleration for fast inference - Scalable infrastructure handling thousands of requests - JSON & streaming response support - Multiple model…

2 subscribers

6 endpoints

The in-depth APIMemo review for this API hasn't been published yet — the data below comes straight from the public marketplace listing.

exo-cuda-inference endpoints

Method	Endpoint	Description
POST	chat /v1/chat/completions	OpenAI-compatible chat completion endpoint. Send a conversation and receive AI-generated text response.
POST	completions /v1/completions	Text completion endpoint. Generate AI text completions for given prompts.
POST	embeddings /v1/embeddings	Generate vector embeddings for text. Useful for semantic search and similarity.
POST	images /v1/images/generations	Generate images from text prompts using Stable Diffusion XL.
POST	transcriptions /v1/audio/transcriptions	Convert audio to text. Upload audio file and get transcription.
GET	models /v1/models	List all available AI models on this inference API.

exo-cuda-inference pricing

Plan	Price	Rate limit	Quotas
BASIC	Free	—	Requests: 1,000 / monthly
PRO	$19 / month	—	Requests: 50,000 / monthly
ULTRA	$49 / month	—	Requests: 200,000 / monthly
MEGA	$149 / month	—	Requests: 1,000,000 / monthly

exo-cuda-inference

exo-cuda-inference endpoints

exo-cuda-inference pricing

More Artificial Intelligence/Machine Learning APIs

Low-Cost Image Generate API

OPEN AI

Best Astrology API - Natal Charts, Transits & Synastry

AI Content Detector | AI/GPT

ChatGPT VISION

ChatGPT 4