Dedoc
This sample demonstrates the use of Dedoc
in combination with LangChain
as a DocumentLoader
.
Overview
Dedoc is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats.
Dedoc
supports DOCX
, XLSX
, PPTX
, EML
, HTML
, PDF
, images and more.
Full list of supported formats can be found here.
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
DedocFileLoader | langchain_community | ❌ | beta | ❌ |
DedocPDFLoader | langchain_community | ❌ | beta | ❌ |
DedocAPIFileLoader | langchain_community | ❌ | beta | ❌ |
Loader features
Methods for lazy loading and async loading are available, but in fact, document loading is executed synchronously.
Source | Document Lazy Loading | Async Support |
---|---|---|
DedocFileLoader | ❌ | ❌ |
DedocPDFLoader | ❌ | ❌ |
DedocAPIFileLoader | ❌ |