Multiple Document Types Extraction

  {
    "workloads": [
      {
        "raw_data": "base64_encoded_pdf_content",
        "schemas": ["schema1", "schema2"]
      },
      {
        "raw_data": "base64_encoded_pdf_2_content",
        "schemas": ["schema3"]
      },
      {
        "data_source": "web",
        "documents_location": "https://www.example.com/article1",
        "schemas": ["schema4"]
      }
    ],
    "provider_type": "openai",
    "provider_model_name": "gpt-4o",
    "api_key": "sk-..."
  }

  {
    "task_id": "b6781f5b-022b-485e-b93c-6a958e51b992",
    "message": "Pipeline processing started"
  }

This explains how to use the POST /pipelines endpoint to extract data from multiple document types at the same time. Here is a payload that will simultaneously extract data from two PDFs and a website.

  {
    "workloads": [
      {
        "raw_data": "base64_encoded_pdf_content",
        "schemas": ["schema1", "schema2"]
      },
      {
        "raw_data": "base64_encoded_pdf_2_content",
        "schemas": ["schema3"]
      },
      {
        "data_source": "web",
        "documents_location": "https://www.example.com/article1",
        "schemas": ["schema4"]
      }
    ],
    "provider_type": "openai",
    "provider_model_name": "gpt-4o",
    "api_key": "sk-..."
  }

  {
    "task_id": "b6781f5b-022b-485e-b93c-6a958e51b992",
    "message": "Pipeline processing started"
  }

Web Extraction Markdown Extraction

  {
    "workloads": [
      {
        "raw_data": "base64_encoded_pdf_content",
        "schemas": ["schema1", "schema2"]
      },
      {
        "raw_data": "base64_encoded_pdf_2_content",
        "schemas": ["schema3"]
      },
      {
        "data_source": "web",
        "documents_location": "https://www.example.com/article1",
        "schemas": ["schema4"]
      }
    ],
    "provider_type": "openai",
    "provider_model_name": "gpt-4o",
    "api_key": "sk-..."
  }

  {
    "task_id": "b6781f5b-022b-485e-b93c-6a958e51b992",
    "message": "Pipeline processing started"
  }

Get Started

Examples

Multiple Document Types Extraction