开发者

C/C++ alternative to Apache Tika

开发者 https://www.devze.com 2023-03-10 20:06 出处:网络
I am looking for a C/C++ alternative for Apache Tika framework which is Java based. Specifically, I am searching for file meatadata and structured text extraction all under one framework. After some o

I am looking for a C/C++ alternative for Apache Tika framework which is Java based. Specifically, I am searching for file meatadata and structured text extraction all under one framework. After some online searching and browsing the closest thing I have is GNU libextractor and a bunch of individual file filters that parse documents to extract text data (pdftoext, xls2csv ..etc)

Can anyone please recomme开发者_C百科nd a good library comparable to Apache's Tika ?

Thanks


KDE provides a library called KFileMetaData which they internally use for their file indexer.

It uses C++, Qt5 and supports most of the basic formats such as - ms-office-2007, odfs, pdfs, images, video, audio and ebooks.


Tika has a network server mode, so you could always start Tika using that and then send it requests from your C++ code?

Alternately, Tika has a CLI mode, so you could fire off a new Tika process each time and read the data from the pipe.

0

精彩评论

暂无评论...
验证码 换一张
取 消