Deploying Your AI Model
to Production: A
Practical Guide
Deploying Your AI Model to Production: A Practical Guide
You've connected your data, trained your model, and tested it in the chat playground. Now it's time to deploy it to production and integrate it into your application.
This guide walks you through deploying your AI model using Cuadra AI's production-ready API, from getting your API keys to handling real-world scenarios.
Prerequisites
Before deploying, make sure you've completed:
- ✅ Connect Phase - Your datasets are created and files are processed
- ✅ Train Phase - Your model is configured with instructions and datasets attached
- ✅ Testing - You've tested your model in the chat playground and verified it works correctly
Getting Your API Credentials
The first step is getting your API endpoint and authentication key.
Step 1: Navigate to Deploy Section
- Go to your model in the dashboard
- Click on the Deploy tab
- You'll see your API endpoint and authentication key
Step 2: Copy Your Credentials
You'll see:
- API Endpoint - Your model's unique API URL
- API Key - Your authentication token (keep this secure!)
Important: Treat your API key like a password. Never commit it to version control or expose it in client-side code.
Step 3: Regenerate if Needed
If you need to regenerate your API key:
- Click "Regenerate API Key"
- Confirm the action
- Update your application with the new key
- The old key will no longer work
Understanding the API
Cuadra AI provides a standard REST API that's easy to integrate into any application.
API Basics
- Protocol: HTTPS
- Format: JSON request/response
- Authentication: API key in Authorization header
- Method: POST for chat requests
Request Structure
{
"message": "Your user's message here",
"stream": false,
"responseFormat": {
"type": "json_schema",
"json_schema": {
// Optional: JSON schema for structured outputs
}
}
}
Response Structure
{
"response": "The AI's response text",
"usage": {
"input_tokens": 150,
"output_tokens": 200,
"total_tokens": 350
}
}
Integration Examples
JavaScript/Node.js
async function callCuadraAPI(userMessage) {
const response = await fetch('YOUR_API_ENDPOINT', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
message: userMessage,
stream: false
})
});
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
const data = await response.json();
return data.response;
}
// Usage
const answer = await callCuadraAPI("What are your return policies?");
console.log(answer);
Python
import requests
import os
def call_cuadra_api(user_message):
url = os.getenv('CUADRA_API_ENDPOINT')
headers = {
'Authorization': f"Bearer {os.getenv('CUADRA_API_KEY')}",
'Content-Type': 'application/json'
}
data = {
'message': user_message,
'stream': False
}
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
return response.json()['response']
# Usage
answer = call_cuadra_api("What are your return policies?")
print(answer)
Streaming Responses
For better user experience, use streaming:
async function streamCuadraAPI(userMessage, onToken) {
const response = await fetch('YOUR_API_ENDPOINT', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
message: userMessage,
stream: true // Enable streaming
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.token) {
onToken(data.token); // Callback for each token
}
}
}
}
}
// Usage
let fullResponse = '';
streamCuadraAPI("Tell me about your products", (token) => {
fullResponse += token;
// Update UI with each token for real-time display
updateChatUI(fullResponse);
});
Production Best Practices
1. Environment Variables
Never hardcode API keys. Use environment variables:
# .env file
CUADRA_API_ENDPOINT=https://api.cuadra.ai/v1/models/your-model-id
CUADRA_API_KEY=your-api-key-here
2. Error Handling
Implement robust error handling:
async function callCuadraAPIWithRetry(userMessage, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const response = await fetch('YOUR_API_ENDPOINT', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ message: userMessage })
});
if (response.status === 429) {
// Rate limit - wait and retry
const waitTime = Math.pow(2, attempt) * 1000; // Exponential backoff
await new Promise(resolve => setTimeout(resolve, waitTime));
continue;
}
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
return await response.json();
} catch (error) {
if (attempt === maxRetries) throw error;
// Wait before retry
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
}
3. Rate Limiting
Respect rate limits and implement client-side throttling:
- Check your plan's rate limits
- Implement request queuing if needed
- Use exponential backoff for 429 responses
- Monitor your usage in the dashboard
4. Timeout Handling
Set appropriate timeouts:
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout
try {
const response = await fetch('YOUR_API_ENDPOINT', {
signal: controller.signal,
// ... other options
});
clearTimeout(timeoutId);
} catch (error) {
if (error.name === 'AbortError') {
// Handle timeout
}
}
5. Logging and Monitoring
Track your API usage:
async function callCuadraAPI(userMessage) {
const startTime = Date.now();
try {
const response = await callCuadraAPI(userMessage);
const duration = Date.now() - startTime;
// Log successful request
logMetrics({
success: true,
duration,
tokens: response.usage.total_tokens
});
return response;
} catch (error) {
// Log error
logMetrics({
success: false,
duration: Date.now() - startTime,
error: error.message
});
throw error;
}
}
Structured Outputs
For consistent, parseable responses, use JSON schema:
const response = await fetch('YOUR_API_ENDPOINT', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
message: "What are your return policies?",
responseFormat: {
type: "json_schema",
json_schema: {
name: "return_policy_response",
schema: {
type: "object",
properties: {
policy: { type: "string" },
timeframe: { type: "string" },
conditions: { type: "array", items: { type: "string" } }
},
required: ["policy", "timeframe"]
}
}
}
})
});
Security Considerations
API Key Security
- Server-Side Only - Never expose API keys in client-side code
- Environment Variables - Store keys in environment variables
- Rotation - Rotate keys periodically
- Access Control - Limit who can access API keys
Data Privacy
- Encryption - Use HTTPS for all API calls
- Data Minimization - Only send necessary data
- Compliance - Ensure compliance with GDPR, CCPA, etc.
- Audit Logs - Log API usage for security audits
Monitoring and Analytics
Usage Dashboard
Monitor your deployment in the Cuadra AI dashboard:
- API Calls - Track request volume
- Token Usage - Monitor token consumption
- Costs - View costs per model
- Performance - Track response times
Custom Analytics
Implement your own analytics:
- Track user satisfaction
- Monitor response quality
- Measure business metrics (resolution rate, etc.)
- Identify common questions
Troubleshooting
Common Issues
Issue: 401 Unauthorized
- Check your API key is correct
- Verify the key hasn't been regenerated
- Ensure the Authorization header format is correct
Issue: 429 Rate Limit
- Check your plan's rate limits
- Implement request throttling
- Use exponential backoff
Issue: Slow Responses
- Check your model configuration
- Review knowledge base size
- Consider using streaming for better UX
- Monitor API performance in dashboard
Issue: Unexpected Responses
- Review your model's system instructions
- Check attached datasets are correct
- Test in chat playground first
- Verify knowledge base is up to date
Scaling Your Deployment
As your usage grows:
- Monitor Usage - Track API calls and costs
- Optimize Configuration - Refine model settings for efficiency
- Upgrade Plan - Consider higher-tier plans for more resources
- Cache Responses - Cache common queries when appropriate
- Load Balancing - Distribute requests across multiple endpoints if needed
Next Steps
You're now ready to deploy your AI model to production! Remember:
- Secure Your Keys - Keep API keys safe and never expose them
- Handle Errors - Implement robust error handling
- Monitor Usage - Track performance and costs
- Iterate - Continuously improve based on real-world usage
Start deploying your model and see how easy it is to integrate AI into your application.
Need help with deployment? Check our API documentation or contact support.