在Spring Boot中使用JSON Schema验证AI输出

在上一篇文章《Java AI 实战：本地模型 JSON 结构化输出》中，我们介绍了如何使用 Ollama 生成 JSON 格式的结构化输出。然而，在实际应用中，确保 AI 生成的输出符合预期的数据结构是非常重要的。通过使用 JSON Schema，我们可以定义期望的输出格式，并验证 AI 生成的内容是否符合要求。本文将介绍如何在 Spring Boot 应用程序中实现这一功能，帮助你更好地处理和验证 AI 的结构化输出。

项目依赖

首先，我们需要添加 JSON Schema 验证器的依赖。在pom.xml中添加：

<dependency>
    <groupId>com.networknt</groupId>
    <artifactId>json-schema-validator</artifactId>
    <version>1.4.0</version>
</dependency>

networknt/json-schema-validator 是一个轻量级、高性能的 JSON Schema 验证器，专门为 Java 应用程序设计这个依赖对于我们验证 AI 的输出特别有用，因为它：

支持复杂的嵌套 JSON 结构验证
提供清晰的错误信息，便于调试
性能优秀，适合处理大量的验证请求
可以缓存 Schema 定义，提高验证效率

AI 结构化输出

AI 提供了 Function Calling 功能，允许我们定义期望的输出格式。通过指定 JSON Schema，我们可以让模型生成符合特定结构的输出。

定义输出 Schema

以下是一个用于生成产品评论的 Schema 示例：

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "review": {
      "type": "object",
      "properties": {
        "rating": {
          "type": "integer",
          "minimum": 1,
          "maximum": 5,
          "description": "产品评分（1-5星）"
        },
        "summary": {
          "type": "string",
          "maxLength": 100,
          "description": "评论摘要"
        },
        ...
        "sentiment": {
          "type": "string",
          "enum": ["positive", "neutral", "negative"],
          "description": "整体情感倾向"
        }
      },
      "required": ["rating", "summary", "sentiment"]
    }
  }
}

实现 AI 服务

创建一个服务类来处理 OpenAI API 调用和输出验证：

@Service
public class AIReviewService {
    private final OpenAiClient openAiClient;
    private final JsonSchemaValidator schemaValidator;

    @Value("classpath:schemas/review-output-schema.json")
    private Resource schemaResource;

    public ReviewResponse generateStructuredReview(String productDescription) {
        var request = ChatCompletionRequest.builder()
            .model("gpt-4")
            .messages(List.of(
                new Message("system", "你是一个专业的产品评论分析师。请根据用户提供的产品描述生成结构化的评论。"),
                new Message("user", productDescription)
            ))
            .functions(List.of(
                new FunctionDefinition(
                    "generate_review",
                    "生成产品评论",
                    schemaResource // JSON Schema作为函数参数定义
                )
            ))
            .functionCall("generate_review") // 强制使用指定的函数
            .build();

        return openAiClient.createChatCompletion(request)
            .map(this::validateAndParseResponse)
            .orElseThrow(() -> new AIGenerationException("Failed to generate review"));
    }

    private ReviewResponse validateAndParseResponse(String jsonResponse) {
        // 验证生成的JSON是否符合Schema
        if (!schemaValidator.isValid(jsonResponse)) {
            throw new InvalidOutputException("AI generated invalid review format");
        }

        // 解析验证通过的JSON
        return objectMapper.readValue(jsonResponse, ReviewResponse.class);
    }
}

输出验证器

实现一个专门的验证器来处理 AI 生成的输出：

@Component
public class AIOutputValidator {
    private final Map<String, JsonSchema> schemaCache = new ConcurrentHashMap<>();
    private final ObjectMapper objectMapper;

    public AIOutputValidator(ObjectMapper objectMapper) {
        this.objectMapper = objectMapper;
    }

    public ValidationResult validateOutput(String output, String schemaPath) {
        JsonSchema schema = getOrLoadSchema(schemaPath);
        try {
            JsonNode outputNode = objectMapper.readTree(output);
            Set<ValidationMessage> errors = schema.validate(outputNode);

            if (errors.isEmpty()) {
                return ValidationResult.success();
            }

            return ValidationResult.failure(errors.stream()
                .map(ValidationMessage::getMessage)
                .collect(Collectors.toList()));

        } catch (Exception e) {
            return ValidationResult.failure(List.of("Invalid JSON format: " + e.getMessage()));
        }
    }

    private JsonSchema getOrLoadSchema(String schemaPath) {
        return schemaCache.computeIfAbsent(schemaPath, this::loadSchema);
    }

    private JsonSchema loadSchema(String schemaPath) {
        try {
            Resource resource = new ClassPathResource(schemaPath);
            JsonNode schemaNode = objectMapper.readTree(resource.getInputStream());
            return JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V7)
                .getSchema(schemaNode);
        } catch (Exception e) {
            throw new SchemaLoadException("Failed to load schema: " + schemaPath, e);
        }
    }
}

错误处理

添加专门的异常处理来处理 AI 输出验证错误：

@ControllerAdvice
public class AIExceptionHandler extends ResponseEntityExceptionHandler {

    @ExceptionHandler(InvalidOutputException.class)
    public ResponseEntity<ErrorResponse> handleInvalidOutput(InvalidOutputException ex) {
        ErrorResponse error = new ErrorResponse(
            "AI_OUTPUT_VALIDATION_ERROR",
            "AI生成的输出格式无效",
            ex.getValidationErrors()
        );
        return ResponseEntity.badRequest().body(error);
    }

    @ExceptionHandler(AIGenerationException.class)
    public ResponseEntity<ErrorResponse> handleAIGeneration(AIGenerationException ex) {
        ErrorResponse error = new ErrorResponse(
            "AI_GENERATION_ERROR",
            "AI生成内容失败",
            Collections.singletonList(ex.getMessage())
        );
        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body(error);
    }
}

总结

通过使用 JSON Schema 验证 AI 的输出，我们可以确保 AI 生成的内容符合预期的格式和质量要求。这不仅提高了 AI 推理的结果质量和应用的可靠性，还简化了后续的数据处理流程。以下是一些关键的实践建议：

在 Prompt 中包含 JSON Schema
- 在提示词中直接包含 JSON Schema 的定义，可以显著提升模型输出符合预期格式的概率
- 建议使用简化版的 Schema 描述，避免过于复杂的验证规则影响模型理解

结构化的提示模板

请按照以下 JSON Schema 格式生成输出：
{
  "type": "object",
  "properties": {
    "field1": {"type": "string"},
    "field2": {"type": "number"}
  },
  "required": ["field1", "field2"]
}

输出质量优化
- 通过示例展示期望的输出格式
- 明确指出必填字段和数据类型要求
- 在提示词中强调需要严格遵循 JSON 格式规范

这种方法不仅能提高首次生成的准确率，还能减少重试和错误处理的成本，是构建稳定可靠的 AI 应用的重要实践。

参考资料

On this page

项目依赖
AI 结构化输出
定义输出 Schema
实现 AI 服务
输出验证器
错误处理
总结
参考资料

项目依赖

首先，我们需要添加 JSON Schema 验证器的依赖。在pom.xml中添加：

<dependency>
    <groupId>com.networknt</groupId>
    <artifactId>json-schema-validator</artifactId>
    <version>1.4.0</version>
</dependency>

networknt/json-schema-validator 是一个轻量级、高性能的 JSON Schema 验证器，专门为 Java 应用程序设计这个依赖对于我们验证 AI 的输出特别有用，因为它：

支持复杂的嵌套 JSON 结构验证
提供清晰的错误信息，便于调试
性能优秀，适合处理大量的验证请求
可以缓存 Schema 定义，提高验证效率

AI 结构化输出

AI 提供了 Function Calling 功能，允许我们定义期望的输出格式。通过指定 JSON Schema，我们可以让模型生成符合特定结构的输出。

定义输出 Schema

以下是一个用于生成产品评论的 Schema 示例：

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "review": {
      "type": "object",
      "properties": {
        "rating": {
          "type": "integer",
          "minimum": 1,
          "maximum": 5,
          "description": "产品评分（1-5星）"
        },
        "summary": {
          "type": "string",
          "maxLength": 100,
          "description": "评论摘要"
        },
        ...
        "sentiment": {
          "type": "string",
          "enum": ["positive", "neutral", "negative"],
          "description": "整体情感倾向"
        }
      },
      "required": ["rating", "summary", "sentiment"]
    }
  }
}

实现 AI 服务

创建一个服务类来处理 OpenAI API 调用和输出验证：

@Service
public class AIReviewService {
    private final OpenAiClient openAiClient;
    private final JsonSchemaValidator schemaValidator;

    @Value("classpath:schemas/review-output-schema.json")
    private Resource schemaResource;

    public ReviewResponse generateStructuredReview(String productDescription) {
        var request = ChatCompletionRequest.builder()
            .model("gpt-4")
            .messages(List.of(
                new Message("system", "你是一个专业的产品评论分析师。请根据用户提供的产品描述生成结构化的评论。"),
                new Message("user", productDescription)
            ))
            .functions(List.of(
                new FunctionDefinition(
                    "generate_review",
                    "生成产品评论",
                    schemaResource // JSON Schema作为函数参数定义
                )
            ))
            .functionCall("generate_review") // 强制使用指定的函数
            .build();

        return openAiClient.createChatCompletion(request)
            .map(this::validateAndParseResponse)
            .orElseThrow(() -> new AIGenerationException("Failed to generate review"));
    }

    private ReviewResponse validateAndParseResponse(String jsonResponse) {
        // 验证生成的JSON是否符合Schema
        if (!schemaValidator.isValid(jsonResponse)) {
            throw new InvalidOutputException("AI generated invalid review format");
        }

        // 解析验证通过的JSON
        return objectMapper.readValue(jsonResponse, ReviewResponse.class);
    }
}

输出验证器

实现一个专门的验证器来处理 AI 生成的输出：

@Component
public class AIOutputValidator {
    private final Map<String, JsonSchema> schemaCache = new ConcurrentHashMap<>();
    private final ObjectMapper objectMapper;

    public AIOutputValidator(ObjectMapper objectMapper) {
        this.objectMapper = objectMapper;
    }

    public ValidationResult validateOutput(String output, String schemaPath) {
        JsonSchema schema = getOrLoadSchema(schemaPath);
        try {
            JsonNode outputNode = objectMapper.readTree(output);
            Set<ValidationMessage> errors = schema.validate(outputNode);

            if (errors.isEmpty()) {
                return ValidationResult.success();
            }

            return ValidationResult.failure(errors.stream()
                .map(ValidationMessage::getMessage)
                .collect(Collectors.toList()));

        } catch (Exception e) {
            return ValidationResult.failure(List.of("Invalid JSON format: " + e.getMessage()));
        }
    }

    private JsonSchema getOrLoadSchema(String schemaPath) {
        return schemaCache.computeIfAbsent(schemaPath, this::loadSchema);
    }

    private JsonSchema loadSchema(String schemaPath) {
        try {
            Resource resource = new ClassPathResource(schemaPath);
            JsonNode schemaNode = objectMapper.readTree(resource.getInputStream());
            return JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V7)
                .getSchema(schemaNode);
        } catch (Exception e) {
            throw new SchemaLoadException("Failed to load schema: " + schemaPath, e);
        }
    }
}

错误处理

添加专门的异常处理来处理 AI 输出验证错误：

@ControllerAdvice
public class AIExceptionHandler extends ResponseEntityExceptionHandler {

    @ExceptionHandler(InvalidOutputException.class)
    public ResponseEntity<ErrorResponse> handleInvalidOutput(InvalidOutputException ex) {
        ErrorResponse error = new ErrorResponse(
            "AI_OUTPUT_VALIDATION_ERROR",
            "AI生成的输出格式无效",
            ex.getValidationErrors()
        );
        return ResponseEntity.badRequest().body(error);
    }

    @ExceptionHandler(AIGenerationException.class)
    public ResponseEntity<ErrorResponse> handleAIGeneration(AIGenerationException ex) {
        ErrorResponse error = new ErrorResponse(
            "AI_GENERATION_ERROR",
            "AI生成内容失败",
            Collections.singletonList(ex.getMessage())
        );
        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body(error);
    }
}

总结

在 Prompt 中包含 JSON Schema
- 在提示词中直接包含 JSON Schema 的定义，可以显著提升模型输出符合预期格式的概率
- 建议使用简化版的 Schema 描述，避免过于复杂的验证规则影响模型理解

结构化的提示模板

请按照以下 JSON Schema 格式生成输出：
{
  "type": "object",
  "properties": {
    "field1": {"type": "string"},
    "field2": {"type": "number"}
  },
  "required": ["field1", "field2"]
}

输出质量优化
- 通过示例展示期望的输出格式
- 明确指出必填字段和数据类型要求
- 在提示词中强调需要严格遵循 JSON 格式规范

这种方法不仅能提高首次生成的准确率，还能减少重试和错误处理的成本，是构建稳定可靠的 AI 应用的重要实践。

参考资料

On this page

项目依赖
AI 结构化输出
定义输出 Schema
实现 AI 服务
输出验证器
错误处理
总结
参考资料

​项目依赖

​AI 结构化输出

​定义输出 Schema

​实现 AI 服务

​输出验证器

​错误处理

​总结

​参考资料

开源共建

​项目依赖

​AI 结构化输出

​定义输出 Schema

​实现 AI 服务

​输出验证器

​错误处理

​总结

​参考资料

项目依赖

AI 结构化输出

定义输出 Schema

实现 AI 服务

输出验证器

错误处理

总结

参考资料

项目依赖

AI 结构化输出

定义输出 Schema

实现 AI 服务

输出验证器

错误处理

总结

参考资料