Running large language models locally has become increasingly practical for developers who prioritize privacy, cost control, and offline capabilities. LM Studio stands out as one of the most accessible tools for this purpose, offering a polished desktop application with powerful API capabilities. One of its most valuable features is the ability to enforce structured outputs from LLMs, ensuring your applications receive predictable, parseable responses every time.
This guide walks you through implementing both structured and non-structured outputs using LM Studio with Node.js. Whether you need free-form text generation or strictly typed JSON responses, you will find practical examples and best practices to integrate local LLMs into your applications effectively.
What is LM Studio?
LM Studio is a cross-platform desktop application that allows you to download, configure, and run large language models entirely on your local machine. Unlike cloud-based AI services, LM Studio operates completely offline after initial model downloads, ensuring your data never leaves your computer. The application supports GGUF-formatted models from providers like Meta (Llama), Google (Gemma), Mistral, Qwen, and many others available on Hugging Face.
Key features that make LM Studio particularly useful for developers include automatic hardware configuration that optimizes performance based on your GPU and RAM, support for serving multiple models simultaneously, an OpenAI-compatible API that works with existing tooling, and native structured output support through JSON schema enforcement.
Understanding Structured vs Non-Structured Output
Before diving into implementation, it helps to understand the fundamental difference between these two output modes and when to use each.
Non-structured output represents the traditional LLM response format where the model generates free-form text based on your prompt. This approach works well for creative writing, conversational interfaces, explanations, and any scenario where you want natural language responses. The model has complete flexibility in how it formats and presents information.
Structured output constrains the model to generate responses that conform to a specific schema, typically JSON. This guarantees that every response can be parsed programmatically and contains exactly the fields your application expects. Structured output is essential for building reliable integrations, extracting specific data points, and creating applications that depend on consistent response formats.
Setting Up LM Studio for API Access
Before writing any code, you need to configure LM Studio to accept API requests. Start by downloading and installing LM Studio from the official website. Once installed, download at least one model from the Discover section. Models like Llama 3.2, Qwen 3, or Gemma 3 work well for both structured and non-structured outputs.
To enable the API server, navigate to the Developer tab in LM Studio and click “Start Server”. By default, the server runs on port 1234 at localhost. You can also start the server from the command line using the LM Studio CLI:
lms server start --port 1234The server exposes an OpenAI-compatible API, meaning you can use either the official OpenAI SDK or LM Studio’s native TypeScript SDK to interact with your local models.
Project Setup for Node.js
Create a new Node.js project and install the necessary dependencies. You have two options for connecting to LM Studio: the OpenAI SDK for standard compatibility or the native LM Studio SDK for additional features.
mkdir lm-studio-demo
cd lm-studio-demo
npm init -y
npm install @lmstudio/sdk openai zodFor TypeScript support, which is recommended when working with structured outputs:
npm install typescript @types/node ts-node --save-dev
npx tsc --initLM Studio also provides a convenient scaffolding command that sets up a complete TypeScript project with all necessary configuration:
lms create node-typescriptNon-Structured Output Implementation
Let us start with non-structured output, which is simpler to implement and suitable for many use cases. This approach returns free-form text responses from the model.
Using the OpenAI SDK
The OpenAI SDK works seamlessly with LM Studio’s API endpoint. Simply point the base URL to your local server:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:1234/v1',
apiKey: 'lm-studio' // Any string works for local server
});
async function generateResponse(prompt) {
const response = await client.chat.completions.create({
model: 'llama-3.2-3b-instruct', // Use your loaded model name
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: prompt }
],
temperature: 0.7,
max_tokens: 500
});
return response.choices[0].message.content;
}
// Usage example
async function main() {
const result = await generateResponse(
'Explain the benefits of running LLMs locally in three paragraphs.'
);
console.log(result);
}
main().catch(console.error);Using the LM Studio Native SDK
The native LM Studio SDK provides a cleaner API designed specifically for local LLM interactions:
import { LMStudioClient } from '@lmstudio/sdk';
async function generateWithNativeSDK() {
const client = new LMStudioClient();
// Get a reference to any loaded model
const model = await client.llm.model();
// Simple respond method for chat-style interactions
const result = await model.respond(
'What are the key differences between supervised and unsupervised learning?'
);
console.log(result.content);
}
generateWithNativeSDK().catch(console.error);Streaming Non-Structured Responses
For better user experience, especially with longer responses, streaming allows you to display text as it generates:
import { LMStudioClient } from '@lmstudio/sdk';
async function streamResponse() {
const client = new LMStudioClient();
const model = await client.llm.model();
const prediction = model.respond(
'Write a short story about a developer discovering AI.',
{ maxTokens: 1000 }
);
// Stream the response token by token
for await (const fragment of prediction) {
process.stdout.write(fragment.content);
}
// Get final result with statistics
const finalResult = await prediction;
console.log('\n\nTokens generated:', finalResult.stats.predictedTokensCount);
}
streamResponse().catch(console.error);Structured Output Implementation
Structured output ensures the model generates responses that conform to a predefined schema. LM Studio achieves this through grammar-constrained sampling, which restricts the model to only generate tokens that produce valid output according to your schema.
Using Zod Schema (Recommended)
Zod provides TypeScript-first schema validation with excellent type inference. When you use Zod schemas with LM Studio, you get fully typed responses automatically:
import { LMStudioClient } from '@lmstudio/sdk';
import { z } from 'zod';
// Define your schema using Zod
const BookSchema = z.object({
title: z.string(),
author: z.string(),
year: z.number().int(),
genre: z.string(),
summary: z.string()
});
async function getBookInfo(bookTitle) {
const client = new LMStudioClient();
const model = await client.llm.model();
const result = await model.respond(
`Tell me about the book "${bookTitle}". Include its title, author, publication year, genre, and a brief summary.`,
{
structured: BookSchema,
maxTokens: 300 // Recommended to prevent infinite generation
}
);
// result.parsed is fully typed as { title: string, author: string, year: number, genre: string, summary: string }
const book = result.parsed;
console.log(`Title: ${book.title}`);
console.log(`Author: ${book.author}`);
console.log(`Year: ${book.year}`);
console.log(`Genre: ${book.genre}`);
console.log(`Summary: ${book.summary}`);
return book;
}
getBookInfo('The Hitchhiker\'s Guide to the Galaxy').catch(console.error);Using JSON Schema Directly
If you prefer working with raw JSON schemas or need compatibility with existing schema definitions, LM Studio supports that approach as well:
import { LMStudioClient } from '@lmstudio/sdk';
const productSchema = {
type: 'object',
properties: {
name: { type: 'string' },
description: { type: 'string' },
price: { type: 'number' },
category: { type: 'string' },
inStock: { type: 'boolean' },
tags: {
type: 'array',
items: { type: 'string' }
}
},
required: ['name', 'description', 'price', 'category', 'inStock', 'tags']
};
async function generateProductListing(productIdea) {
const client = new LMStudioClient();
const model = await client.llm.model();
const result = await model.respond(
`Create a product listing for: ${productIdea}. Include name, description, price in USD, category, stock status, and relevant tags.`,
{
structured: {
type: 'json',
jsonSchema: productSchema
},
maxTokens: 400
}
);
// Parse the JSON response manually
const product = JSON.parse(result.content);
console.log('Generated Product:');
console.log(JSON.stringify(product, null, 2));
return product;
}
generateProductListing('wireless noise-canceling earbuds for runners').catch(console.error);Using OpenAI SDK with JSON Schema
For compatibility with existing OpenAI-based codebases, you can use structured outputs through the response_format parameter:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:1234/v1',
apiKey: 'lm-studio'
});
const characterSchema = {
type: 'json_schema',
json_schema: {
name: 'characters',
schema: {
type: 'object',
properties: {
characters: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
occupation: { type: 'string' },
personality: { type: 'string' },
background: { type: 'string' }
},
required: ['name', 'occupation', 'personality', 'background']
},
minItems: 1
}
},
required: ['characters']
}
}
};
async function generateCharacters(count) {
const response = await client.chat.completions.create({
model: 'llama-3.2-3b-instruct',
messages: [
{ role: 'system', content: 'You are a creative writing assistant.' },
{ role: 'user', content: `Create ${count} unique fictional characters for a mystery novel.` }
],
response_format: characterSchema,
max_tokens: 800
});
const characters = JSON.parse(response.choices[0].message.content);
characters.characters.forEach((char, index) => {
console.log(`\nCharacter ${index + 1}:`);
console.log(` Name: ${char.name}`);
console.log(` Occupation: ${char.occupation}`);
console.log(` Personality: ${char.personality}`);
console.log(` Background: ${char.background}`);
});
return characters;
}
generateCharacters(3).catch(console.error);Complex Schema Examples
Real-world applications often require more complex schemas with nested objects, arrays, and conditional fields. Here are practical examples demonstrating advanced structured output patterns.
API Response Extraction
Extract structured data from natural language descriptions, useful for building data pipelines:
import { LMStudioClient } from '@lmstudio/sdk';
import { z } from 'zod';
const APIEndpointSchema = z.object({
method: z.enum(['GET', 'POST', 'PUT', 'DELETE', 'PATCH']),
path: z.string(),
description: z.string(),
parameters: z.array(z.object({
name: z.string(),
type: z.string(),
required: z.boolean(),
description: z.string()
})),
responseExample: z.object({
statusCode: z.number(),
body: z.record(z.unknown())
})
});
async function extractAPISpec(description) {
const client = new LMStudioClient();
const model = await client.llm.model();
const result = await model.respond(
`Based on this API description, generate a structured specification:\n\n${description}`,
{
structured: APIEndpointSchema,
maxTokens: 500
}
);
return result.parsed;
}
// Example usage
const apiDescription = `
Create an endpoint that allows users to search for products by name,
category, and price range. It should support pagination and return
product details including name, price, and availability.
`;
extractAPISpec(apiDescription).then(spec => {
console.log('Generated API Specification:');
console.log(JSON.stringify(spec, null, 2));
}).catch(console.error);Multi-Step Data Processing
Combine structured outputs with sequential processing for complex workflows:
import { LMStudioClient } from '@lmstudio/sdk';
import { z } from 'zod';
const SentimentSchema = z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().min(0).max(1),
keyPhrases: z.array(z.string()),
summary: z.string()
});
const ActionItemsSchema = z.object({
items: z.array(z.object({
action: z.string(),
priority: z.enum(['high', 'medium', 'low']),
assignee: z.string().optional(),
deadline: z.string().optional()
}))
});
async function analyzeCustomerFeedback(feedback) {
const client = new LMStudioClient();
const model = await client.llm.model();
// Step 1: Analyze sentiment
const sentimentResult = await model.respond(
`Analyze the sentiment of this customer feedback:\n\n"${feedback}"`,
{ structured: SentimentSchema, maxTokens: 300 }
);
// Step 2: Extract action items based on sentiment
const actionResult = await model.respond(
`Based on this ${sentimentResult.parsed.sentiment} customer feedback, what action items should the team address?\n\nFeedback: "${feedback}"\n\nKey issues: ${sentimentResult.parsed.keyPhrases.join(', ')}`,
{ structured: ActionItemsSchema, maxTokens: 400 }
);
return {
sentiment: sentimentResult.parsed,
actions: actionResult.parsed.items
};
}
const customerFeedback = `
The product quality is excellent and exactly what I needed. However,
the shipping took much longer than expected and customer service was
difficult to reach when I had questions about my order status.
`;
analyzeCustomerFeedback(customerFeedback).then(analysis => {
console.log('Sentiment Analysis:', analysis.sentiment);
console.log('\nRecommended Actions:', analysis.actions);
}).catch(console.error);Error Handling and Best Practices
Working with structured outputs requires careful attention to potential edge cases and failure modes.
Handling Generation Limits
Always set a maxTokens limit to prevent models from getting stuck in infinite loops, particularly with smaller models that may forget to close JSON structures:
import { LMStudioClient } from '@lmstudio/sdk';
import { z } from 'zod';
const DataSchema = z.object({
items: z.array(z.string()),
count: z.number()
});
async function safeStructuredGeneration(prompt) {
const client = new LMStudioClient();
const model = await client.llm.model();
try {
const result = await model.respond(prompt, {
structured: DataSchema,
maxTokens: 500 // Prevents infinite generation
});
return { success: true, data: result.parsed };
} catch (error) {
if (error.message.includes('schema')) {
// Schema validation failed, likely due to interrupted generation
console.error('Generation was interrupted before completing valid JSON');
return { success: false, error: 'incomplete_generation' };
}
throw error;
}
}
safeStructuredGeneration('List 5 programming languages and their primary use cases.')
.then(console.log)
.catch(console.error);Retry Logic for Robustness
Implement retry logic for production applications to handle occasional failures:
async function withRetry(fn, maxRetries = 3, delay = 1000) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries) throw error;
console.log(`Attempt ${attempt} failed, retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
delay *= 2; // Exponential backoff
}
}
}
// Usage
const result = await withRetry(() =>
model.respond('Generate product data', {
structured: ProductSchema,
maxTokens: 400
})
);Architecture Overview
The following diagram illustrates how LM Studio processes requests and enforces structured outputs:
flowchart TB
subgraph Client["Node.js Application"]
A[Application Code] --> B[LM Studio SDK]
B --> C{Output Type?}
end
subgraph LMStudio["LM Studio Server"]
D[API Endpoint
localhost:1234]
E[Model Manager]
F[Loaded LLM]
G[Grammar Constrained
Sampling]
end
C -->|Non-Structured| D
C -->|Structured + Schema| D
D --> E
E --> F
F -->|Free Generation| H[Text Response]
F -->|With JSON Schema| G
G --> I[Valid JSON Response]
H --> J[Return to Client]
I --> J
subgraph Validation["Client-Side Processing"]
J --> K{Structured?}
K -->|Yes| L[Zod Validation]
K -->|No| M[Raw Text]
L --> N[Typed Object]
endPerformance Considerations
When working with local LLMs, understanding performance characteristics helps you build responsive applications.
Structured output generation typically runs slightly slower than free-form generation because the grammar-constrained sampling requires additional processing at each token. However, this overhead is usually minimal and the benefits of guaranteed valid output far outweigh the performance cost.
Model size significantly impacts both quality and speed. Smaller models like Llama 3.2 3B generate faster but may struggle with complex schemas. Larger models like Llama 3.1 70B produce higher quality structured output but require more VRAM and time. For most structured output tasks, 7B to 13B parameter models offer a good balance.
Context length matters for complex prompts. LM Studio automatically handles context management, but be aware that very long system prompts or conversation histories consume tokens that could otherwise be used for generation.
Practical Use Cases
Structured outputs from local LLMs enable numerous practical applications while maintaining data privacy and reducing costs.
Data extraction pipelines benefit from structured outputs by converting unstructured documents into database-ready formats. You can process customer emails, extract relevant fields, and route them appropriately without sending sensitive data to external services.
Content generation systems can produce blog posts, product descriptions, or marketing copy with metadata already structured for your CMS. The model generates both the content and its categorization in a single request.
Code generation tools can output properly formatted code snippets along with explanations, test cases, and documentation, all structured for immediate integration into development workflows.
Conversational agents that need to take actions can use structured outputs to determine intents, extract entities, and format responses consistently, making it easier to build reliable chatbots and virtual assistants.
Conclusion
LM Studio provides a powerful platform for running local LLMs with professional-grade features like structured output support. The combination of privacy, cost savings, and reliable JSON generation makes it an excellent choice for developers building AI-powered applications.
The key takeaways from this guide include understanding when to use structured versus non-structured outputs based on your application needs, leveraging Zod schemas for type-safe structured generation, implementing proper error handling with maxTokens limits and retry logic, and choosing appropriate model sizes for your hardware and quality requirements.
As local LLM capabilities continue to improve, tools like LM Studio make it increasingly practical to build sophisticated AI features without depending on cloud services. The structured output capabilities ensure that your applications can reliably integrate LLM responses into typed, validated data structures that work seamlessly with the rest of your codebase.
References
- LM Studio – Structured Output Documentation (https://lmstudio.ai/docs/developer/openai-compat/structured-output)
- LM Studio – TypeScript SDK Documentation (https://lmstudio.ai/docs/typescript)
- LM Studio – Structured Response with Zod and JSON Schema (https://lmstudio.ai/docs/typescript/llm-prediction/structured-response)
- LM Studio Blog – Introducing lmstudio-python and lmstudio-js (https://lmstudio.ai/blog/introducing-lmstudio-sdk)
- GitHub – LM Studio TypeScript SDK (https://github.com/lmstudio-ai/lmstudio-js)
- NPM – @lmstudio/sdk Package (https://www.npmjs.com/package/@lmstudio/sdk)
- LM Studio Blog – Version 0.3.0 Release Notes (https://lmstudio.ai/blog/lmstudio-v0.3.0)
- Zod – TypeScript-first Schema Validation (https://zod.dev/)
