Deploying an Enterprise AI Gateway: Managing LLM Access at Scale

A Guide to LLM Governance Using Revenium and MuleSoft: Enterprises today face a critical challenge: enabling developers to harness the power of OpenAI's APIs while maintaining security, governance, and cost control. In this post, we'll explore how combining Revenium and MuleSoft creates a robust framework for managing, monitoring, and governing OpenAI API usage across your organization

Written by

John D'Emic

Published on

December 16, 2024

Copy link

Enterprises today face a critical challenge: enabling developers to harness the power of OpenAI's APIs while maintaining security, governance, and cost control. In this post, we'll explore how combining Revenium and MuleSoft creates a robust framework for managing, monitoring, and governing OpenAI API usage across your organization.

This solution addresses a growing concern in enterprise environments - the rise of "Shadow AI," where developers bypass official channels by using personal API keys, sharing credentials, or adopting unauthorized AI platforms. This practice becomes particularly risky when developers feed sensitive or proprietary data into these platforms' retrieval-augmented generation (RAG) systems.

Beyond security, this approach provides invaluable insights into enterprise-wide LLM adoption patterns. CIOs, CTOs, API platform owners, product managers, and architects can proactively shape their API strategy based on actual usage data. Organizations can stay ahead of emerging AI capabilities, implement precise usage-based chargeback mechanisms across teams, and identify opportunities to commercialize innovative solutions developed within the enterprise.

To achieve this, we'll demonstrate how Revenium's advanced metering capabilities and MuleSoft's API management platform work together to create a secure, observable proxy layer for all OpenAI interactions. This approach centralizes control and provides deep visibility into your organization's AI consumption patterns.

Deploying a Secure OpenAI Gateway with MuleSoft

Let's begin by creating a secure gateway for OpenAI traffic using MuleSoft API’s Manager. This proxy will serve as the foundation for all OpenAI interactions within your enterprise:

By configuring the “/v1” OpenAI API endpoint as the implementation URI and applying the "REVENIUM" tag, we enable two powerful features. First, this allows automatic synchronization with Revenium's monitoring platform. Second, it unlocks seamless integration between Revenium's Drop-In Storefront and MuleSoft's developer portals—Anypoint Community Manager and Anypoint Experience Hub.

‍

Securing and Monitoring Your AI Gateway: Policy Application

To ensure secure access management, we'll implement the Client ID Enforcement policy. This critical security layer enables developers to authenticate through your organization's identity provider while keeping OpenAI API keys safely protected at the gateway level. Centralizing credential management allows you to maintain complete control over API access without compromising developer productivity.

Next, we'll implement the Header Injection policy to establish a centralized OpenAI authentication model. This policy automatically injects your organization's API key into all requests, providing a secure foundation for enterprise-wide OpenAI access:

‍

This centralized approach delivers key advantages for enterprise governance. Your organization can maintain a carefully controlled set of OpenAI API keys—segmented by environment, geography, or other business needs—while leveraging your API management platform to implement fine-grained access controls at the team and developer levels.

To complete our governance framework, we'll implement the Revenium Metering Policy. This robust observability layer captures and analyzes all API traffic flowing through the MuleSoft proxy, providing deep insights into your organization's API (and AI) consumption patterns.
‍

When configuring the Revenium Metering Policy, the default values can remain unchanged. However, three key settings require specific attention:

Revenium API Key: Authenticate with your Revenium Platform API credentials
Payment Required Message & Request Validation: Enable automatic enforcement of usage limits and subscription status. This ensures API access is immediately restricted when consumers exceed their quota, have expired subscriptions, or are disabled in Revenium.
Element Expressions: Configure DataWeave expressions to capture essential OpenAI metrics including model selection, prompt tokens, and completion tokens from each API interaction.

‍

Creating Revenium Sources for Enterprise AI Governance

Let's configure our Revenium sources. A source in Revenium represents a connection to any data endpoint you want to track, analyze, and potentially monetize. In our OpenAI implementation, each source maps to a specific OpenAI API endpoint, enabling granular usage tracking and analytics. These sources can be manually defined or automatically discovered through Revenium's auto-discovery capabilities.

Understanding Metering Elements is crucial for precise OpenAI cost allocation. While simple API transaction counts suffice for basic services, OpenAI's consumption model requires more granular tracking. Metering Elements allow us to capture and analyze the specific metrics that drive OpenAI costs: the LLM model in use, prompt tokens consumed, and completion tokens generated. This detailed level of metering ensures accurate chargeback and provides deeper insights into your organization's AI resource utilization.

Revenium offers flexible configuration options for elements - they can be manually defined or automatically discovered through traffic analysis. In our OpenAI implementation, Revenium has already detected and cataloged the various AI models accessed through our MuleSoft proxy, streamlining our setup process.

Let's examine how this works by looking at the source configuration for the chat completion endpoint:

Our source configuration leverages two key elements: first, we've linked the automatically discovered metering elements to track AI model usage and token consumption. Secondly, we've implemented a regular expression classification pattern that identifies and routes all chat completion traffic to this specific source, ensuring accurate usage tracking. This granular configuration allows us to monitor and meter different OpenAI endpoints independently, providing precise insight into each API's utilization patterns.

‍

Productizing Your Enterprise AI Gateway in Revenium

Now that we've configured our AI Gateway sources, let's create a Revenium product that transforms API calls into measurable business value. Using this framework, your enterprise can track, measure, and allocate AI requests efficiently.

Most settings can use default values when configuring the Enterprise AI Gateway product. However, three key configurations require special attention:

Product Name: This identifier appears throughout the Revenium platform and in developer portals like Anypoint Experience Hub, serving as the brand identity for your AI Gateway.
Included Sources: We'll start by including the OpenAI API as our initial gateway service. The modular nature of this configuration allows you to progressively expand your AI Gateway's capabilities by adding:some text
- Fine-tuning and embedding endpoints
- Custom fine-tuned models
- Alternative LLM providers (such as Anthropic, AWS SageMaker, or Ollama)
- AI agents and specialized services
Metering Elements: Enabling this feature allows for sophisticated usage-based pricing based on:some text
- Specific LLM model utilization
- Prompt token consumption
- Completion token generation

Let's examine how these metering elements translate into your pricing structure:

This configuration demonstrates how Revenium enables granular cost tracking across different OpenAI models and usage types. We've set up distinct pricing for prompt and completion tokens, aligning with OpenAI's cost structure. For instance, GPT-4 models (like gpt-4-0613) are configured at higher rates than GPT-3.5-turbo models, reflecting their actual cost differences. The "Sum Element Values Received" charge logic ensures accurate token counting for precise usage tracking and chargeback. This flexible pricing structure allows organizations to implement direct cost pass-through or custom markup strategies while maintaining transparent cost allocation across different teams and use cases.

‍

Real-Time AI Usage Analytics: Understanding Your Enterprise LLM Consumption

With our AI Gateway configured, we can now explore how Revenium provides deep visibility into your organization's AI usage patterns. The platform offers comprehensive analytics that help you understand usage patterns, monitor performance, and track costs across teams and business units.

Let's examine the key analytics views that provide insights into your AI Gateway.

‍

Product Transaction Log View

This view provides a detailed transaction-level analysis of API usage across your enterprise. Each row represents an individual API call, showing critical information like the timestamp, product usage, subscriber details, and associated costs. Note how the platform distinguishes between different teams (Marketing, Research & Development) and tracks their respective usage patterns.

‍
Drilling down into the transactions reveals the granular cost and usage metrics essential for accurate chargeback and monitoring. For example, clicking into a specific transaction shows the exact charge calculation, breaking down how costs are computed based on token usage and model type.

In this case, the platform calculates charges for GPT-3.5-turbo usage by multiplying the prompt tokens (128.0) by the per-token rate (0.001) to arrive at the final charge of $0.13. This level of detail helps organizations understand exactly how AI costs accumulate and enables precise attribution of expenses to specific teams or projects.

‍

Performance Analytics Dashboard

The performance dashboard offers detailed metrics about API response times and system performance. The view shows average latency and P95 latency measurements for each API call, helping you monitor service quality and identify potential bottlenecks. The ability to filter by credential name allows for team-specific performance analysis.

Universal API Management Filter Options

The “Platform Filter” highlights Revenium's universal API management capabilities. While we're focused on MuleSoft in this implementation, the platform can monitor API traffic across various gateways including Kong, Gravitee, and others, providing a unified view regardless of your infrastructure choices.

‍

Source-Level Analytics

The source filter view demonstrates how to drill down into specific OpenAI services (Chat Completions, Embeddings, Moderations, Fine Tunes, Edits). This granular visibility helps you understand which AI capabilities are being utilized most frequently and by whom.

These analytics capabilities ensure you maintain complete visibility over your AI Gateway, enabling data-driven decisions about resource allocation, cost management, and service optimization.

‍

Targeted AI Usage Reports

The Revenium Custom Reports interface provides powerful filtering capabilities to analyze your AI usage patterns. This screenshot demonstrates how organizations can create detailed reports by combining multiple parameters:

Filtering by specific LLM models (shown here with GPT-4 prompt tokens)
Isolating usage by team or department (Marketing credential filter applied)
Setting custom date ranges for trend analysis
Tracking both consumption (token usage) and associated costs

The resulting report consolidates AI interactions and provides detailed rated elements and associated charge breakdowns. This example shows how a marketing team's use of various GPT models (including GPT-4 and GPT-3.5-turbo) translates to actual costs, enabling precise departmental chargeback and resource optimization.

This level of granular reporting helps organizations understand exactly how their AI resources are utilized, enabling data-driven decisions about resource allocation and cost management.

‍

Securing Your AI Future: From Implementation to Insights

By implementing an Enterprise AI Gateway with Revenium and MuleSoft, organizations can transform uncontrolled AI adoption into a governed, observable, and cost-effective service. This approach delivers several key benefits:

Centralized Governance: Eliminate Shadow AI by providing developers a secure, approved path to AI capabilities while maintaining centralized control over API keys and access patterns.
Cost Transparency: Gain precise visibility into AI consumption costs across teams and departments, enabling accurate chargeback and informed budget planning.
Performance Monitoring: Track latency, usage patterns, and service quality across all AI interactions to ensure optimal performance for enterprise applications.
Future-Ready Architecture: While we've focused on OpenAI integration, this architecture provides a foundation for incorporating additional AI providers, fine-tuned models, and specialized AI services as your needs evolve.

As AI adoption accelerates across the enterprise, a robust governance and observability framework becomes increasingly critical. MuleSoft's API management capabilities and Revenium's metering and analytics combine to create a scalable foundation for secure, observable, and cost-effective AI integration.

To get started with your own Enterprise AI Gateway, create a free account or contact our team to discuss your specific requirements.

Get our blog updates

^{No spam. Just the latest releases and tips, interesting articles, and exclusive insights in your inbox.}

Thank you!

Oops! Something went wrong while submitting the form.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.