java - How to save a vector in a postgresdb properly using pgvector?

I am working on saving a vector generated by the OpenAI model text-embedding-ada-002. The model returns a list of double values, which I convert into a String before saving it in my EnrichmentEntity.

I am running PostgreSQL in a Docker container with the pgvector image, which enables the use of vectors and provides support for the vector datatype. However, I am encountering an issue with saving the EnrichmentEntity properly in the database.

Although I am able to convert the String containing the vector before saving it, I am unsure how to convert it to the correct vector datatype before insertion. I have attempted to handle the conversion in the repository, but the entity is not being stored in the database. Interestingly, no error is thrown; the operation simply does not persist the entity.

I would greatly appreciate any insights or suggestions on how to correctly handle this conversion and store the entity in PostgreSQL.

Here is the relevant code for reference:

the entity:

@Entity
@Table(name = "enrichments")
@SequenceGenerator(name = "enrichments_seq", sequenceName = "enrichments_seq", allocationSize = 1)
public class EnrichmentEntity {

    private static final Logger logger = LoggerFactory.getLogger(EnrichmentEntity.class);

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "PROFESSION")
    private String profession;

    @Embedded
    private DocumentEntity document;

    @Column(name = "INSURANCE")
    private String insurance;

    @Column(name = "EMBEDDING", columnDefinition = "TEXT")
    private String embedding;


    public EnrichmentEntity() {}

    public EnrichmentEntity(String profession, DocumentEntity document, String insurance, String embedding) {
        this.profession = profession;
        this.document = document;
        this.insurance = insurance;
        this.embedding = embedding;
    }

the service:

 @Override
    @Transactional
    public String getOpenAiResponse(String profession, String insurance) {
        String userPrompt = """
                Du bist ein Assistent, der mir hilft, Berufe mit Daten anzureichern, 
                so dass diese in einer Vektordatenbank gespeichert und für die Suche verwendet werden können.
                
                Du sollst nun den folgenden Beruf mit Daten anreichern (JSON-String): %s
                
                Antworte in der folgenden Form:
                
                {
                    "Beruf": "<Gib die Berufsbezeichnung unverändert wieder.>",
                    "Synonyme": "<Gib eine Komma-getrennte Liste von alternativen Berufsbezeichnungen an.>",
                    "Berufsgruppen": "<Gib eine Komma-getrennte Liste von Berufsgruppen an, zu denen der Beruf gehört.>",
                    "Beschreibung": "<Beschreibe den Beruf so präzise wie möglich in etwa 10 Sätzen.>",
                    "Abkürzungen": "<Führe alle Abkürzungen zur Berufsbezeichnung in einer Komma-getrennten Liste auf.>",
                    "Ausgeschrieben": "<Schreibe alle Abkürzungen in ganzen Worten aus.>"
                }
                
                Beachte die Anweisungen in den spitzen Klammern und ersetze sie durch gültige Strings. 
                Antworte ausschließlich mit den geforderten Daten in **korrektem JSON-Format**.
                """.strip().formatted(profession);

        Map<String, Object> requestBody = new HashMap<>();
        requestBody.put("model", deployment);

        List<Map<String, String>> messages = new ArrayList<>();
        messages.add(Map.of("role", "system", "content", "Du bist ein Assistent, der Berufe mit strukturierten Daten anreichert."));
        messages.add(Map.of("role", "user", "content", userPrompt));

        requestBody.put("messages", messages);
        requestBody.put("max_tokens", 800);
        requestBody.put("temperature", 0);

        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        headers.setBearerAuth(apiKey);

        HttpEntity<Map<String, Object>> requestEntity = new HttpEntity<>(requestBody, headers);


        ResponseEntity<String> response = restTemplate.exchange(apiEndpoint, HttpMethod.POST, requestEntity, String.class);
        String message = ChatGPTResponseMapper.extractContent(response.getBody());
        DocumentEntity documentEntity = getEnrichedDocument(message);
        List<Double> embeddingDocument = getEmbedding(documentEntity.toString());
        String embeddingJson = convertListToString(embeddingDocument);
        EnrichmentEntity enrichmentEntity = new EnrichmentEntity(profession, documentEntity, insurance, embeddingJson);
        System.out.println(enrichmentEntity);
        enrichmentRepository.saveEmbeddingAsVector(enrichmentEntity);
        enrichmentRepository.flush();
        System.out.println(enrichmentEntity);
        return ChatGPTResponseMapper.extractContent(response.getBody());
    }

    @Override
    public List<Double> getEmbedding(String text) {
        Map<String, Object> requestBody = new HashMap<>();
        requestBody.put("model", "text-embedding-ada-002");
        requestBody.put("input", text);

        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        headers.setBearerAuth(apiKey);

        HttpEntity<Map<String, Object>> requestEntity = new HttpEntity<>(requestBody, headers);

        ResponseEntity<Map> response = restTemplate.exchange(embeddingEndpoint, HttpMethod.POST, requestEntity, Map.class);

        if (response.getBody() != null) {
            List<Map<String, Object>> data = (List<Map<String, Object>>) response.getBody().get("data");
            if (!data.isEmpty()) {
                List<Double> embedding = (List<Double>) data.get(0).get("embedding");
                return embedding;
            }
        }

        throw new RuntimeException("No embedding received from OpenAI");
    }

    @Override
    public List<EnrichmentEntity> searchProfessions(String profession, String insurance) {
        List<Double> embedding = this.getEmbedding(profession);

        String embeddingString = convertListToStringForSearch(embedding);

        return enrichmentRepository.findTop5ProfessionsByInsuranceAndSimilarity(insurance, embeddingString);
    }


    public DocumentEntity getEnrichedDocument(String enrichment) {
        try {
            String cleanedEnrichment = cleanJsonString(enrichment);
            ObjectMapper objectMapper = new ObjectMapper();
            JsonNode jsonNode = objectMapper.readTree(cleanedEnrichment);

            return new DocumentEntity(
                    jsonNode.get("Beruf").asText(),
                    jsonNode.get("Synonyme").asText(),
                    jsonNode.get("Berufsgruppen").asText(),
                    jsonNode.get("Beschreibung").asText(),
                    jsonNode.get("Abkürzungen").asText(),
                    jsonNode.get("Ausgeschrieben").asText()
            );
        } catch (Exception e) {
            throw new RuntimeException("Fehler beim Verarbeiten der OpenAI-Antwort: " + e.getMessage(), e);
        }
    }

    private String cleanJsonString(String json) {
        return json
                .replaceAll("^```json", "")
                .replaceAll("```$", "")
                .trim();
    }

    private String convertListToString(List<Double> list) {
        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.append("[");
        for (int i = 0; i < list.size(); i++) {
            stringBuilder.append(list.get(i));
            if (i < list.size() - 1) {
                stringBuilder.append(", ");
            }
        }
        stringBuilder.append("]");
        return stringBuilder.toString();
    }

the repository:

    @Modifying
    @Transactional
    @Query(value = """
    INSERT INTO enrichments (profession, insurance, embedding, profession_name, synonyms, job_groups, description, abbreviations, full_written) 
    VALUES (:#{#entity.profession}, :#{#entity.insurance}, CAST(:#{#entity.embedding} AS vector), 
            :#{#entity.document.professionName}, :#{#entity.document.synonyms}, 
            :#{#entity.document.jobGroups}, :#{#entity.document.description}, 
            :#{#entity.document.abbreviations}, :#{#entity.document.fullWritten})
    """, nativeQuery = true)
    void saveEmbeddingAsVector(@Param("entity") EnrichmentEntity entity);

this is my table definition:

CREATE TABLE enrichments ( id SERIAL PRIMARY KEY, profession TEXT, insurance TEXT, embedding vector(1536), profession_name TEXT, synonyms TEXT, job_groups TEXT, description TEXT, abbreviations TEXT, full_written TEXT );

thanks in advance :)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

java - How to save a vector in a postgresdb properly using pgvector? - Stack Overflow

与本文相关的文章

评论列表(0)