Home / AI Platforms

AI Inference Platforms for Self-Hosted Tools

Run AI models locally or in the cloud. These platforms power self-hosted AI tools like chatbots, coding assistants, and document search.

13 local / self-hosted 15 cloud providers

Local AI / Self-Hosted

Run AI models on your own hardware. Full data privacy, no API costs, no internet required.

Ollama

Run large language models locally with a single binary -- the easiest way to get started

Open Source 20 tools →

Open WebUI

Self-hosted ChatGPT-like interface for Ollama and OpenAI-compatible APIs

Open Source 20 tools →

llama.cpp

Highly optimized C/C++ inference engine for LLMs -- the foundation most local AI tools build on

Open Source 20 tools →

GPT4All

Desktop application for running LLMs locally with document chat and a friendly GUI

Open Source 20 tools →

vLLM

High-throughput LLM serving engine -- the performance king for production deployments

Open Source 20 tools →

AnythingLLM

All-in-one AI chatbot framework for building local AI agents that interact with your data

Open Source 20 tools →

text-generation-webui

Gradio-based web UI for running large language models with advanced parameter tuning

Open Source 20 tools →

LocalAI

Universal API hub that routes requests to multiple AI backends through OpenAI-compatible endpoints

Open Source 20 tools →

Jan

Open-source desktop alternative to ChatGPT with agentic workflows and project workspaces

Open Source 20 tools →

LibreChat

Polished ChatGPT-style web UI that unifies multiple AI backends with enterprise features

Open Source 20 tools →

KoboldCpp

Lightweight standalone app for running GGUF models locally, especially popular for roleplay/fiction

Open Source 20 tools →

LM Studio

Desktop app for discovering, downloading, and running local LLMs with a polished interface

Free 20 tools →

LobeChat

Most polished self-hosted ChatGPT alternative with beautiful UI and plugin ecosystem

Open Source 20 tools →

Cloud AI Inference

Access AI models via API without managing GPU infrastructure. Pay per use or per hour.

Baseten

Model serving platform that exposes models as HTTP endpoints with built-in autoscaling

Paid 20 tools →

Cerebras

AI inference on custom wafer-scale chips delivering 2000+ tokens per second

Paid 20 tools →

DeepInfra

Pay-per-use AI inference with OpenAI-compatible API for easy migration

Paid 20 tools →

fal.ai

Generative AI platform specializing in fast image and video model inference

Freemium 20 tools →

Fireworks AI

Production-grade inference platform with proprietary FireAttention engine for speed and scale

Paid 20 tools →

Groq

Ultra-fast AI inference using custom LPU hardware -- fastest tokens-per-second in the industry

Freemium 20 tools →

Hugging Face Inference

Inference API for 400K+ models on Hugging Face Hub with serverless and dedicated endpoints

Freemium 20 tools →

Lambda Cloud

GPU cloud built for deep learning with on-demand and reserved NVIDIA GPU instances

Paid 20 tools →

Modal

Serverless cloud for AI/ML with Python-native interface and dynamic GPU scaling

Freemium 20 tools →

Replicate

Cloud platform to run ML models via API without managing infrastructure

Paid 20 tools →

RunPod

GPU cloud for AI inference and training with serverless and dedicated pod options

Paid 20 tools →

SambaNova

Enterprise AI platform with custom RDU chips optimized for high-performance inference

Paid 20 tools →

SiliconFlow

Low-cost AI inference platform with 2.3x faster speeds and transparent pay-per-use pricing

Paid 20 tools →

Together AI

Cloud platform for running hundreds of open-source AI models with community-driven approach

Paid 20 tools →

Vast.ai

Decentralized GPU marketplace for renting idle compute at the lowest prices

Paid 20 tools →

Need help setting up local AI?

We deploy self-hosted AI infrastructure for businesses. From Ollama on a single server to vLLM clusters with GPU autoscaling.

Visit upready.dev →