How can I read pdf in python?

How can I read pdf in python?

You can USE PyPDF2 package

#install pyDF2
pip install PyPDF2

# importing all the required modules
import PyPDF2

# creating an object 
file = open(example.pdf, rb)

# creating a pdf reader object
fileReader = PyPDF2.PdfFileReader(file)

# print the number of pages in pdf file
print(fileReader.numPages)

Follow this Documentation http://pythonhosted.org/PyPDF2/

You can use textract module in python

Textract

for install

pip install textract

for read pdf

import textract
text = textract.process(path/to/pdf/file, method=pdfminer)

For detail Textract

How can I read pdf in python?

Try PyPDF2.

There is a good tutorial here: https://automatetheboringstuff.com/chapter13/

Leave a Reply

Your email address will not be published. Required fields are marked *