Python: urllib.error.HTTPError: HTTP Error 404: Not Found

Python: urllib.error.HTTPError: HTTP Error 404: Not Found

So apparently the default display number of questions per page is 50 so the range you defined in the loop goes out of the available number of pages with 50 questions per page. The range should be adapted to be within the number of total pages with 50 questions each.

This code will catch the 404 error which was the reason you got an error and ignore it just in case you go out of the range.

from urllib.request import urlopen

def find_bad_qn(a):
    url = https://stackoverflow.com/questions?page= + str(a) + &sort=active
    try:
        urlopen(url)
    except:
        pass

print(Please Wait.. it will take some time)
for i in range(298314,298346):
    find_bad_qn(i)

I have exactly the same problem. The url that I want to get using urllib exists and is accessible using normal browser, but urllib is telling me 404.

The solution for me is not use urllib:

import requests
requests.get(url)

This works for me.

Python: urllib.error.HTTPError: HTTP Error 404: Not Found

The default User-Agent doesnt seem to have as much access as Mozilla.

Try importing Request and append , headers={User-Agent: Mozilla/5.0} to the end of your url.

ie:

from urllib.request import Request, urlopen    
url = fhttps://stackoverflow.com/questions?page={str(a)}&sort=active    
req = Request(url, headers={User-Agent: Mozilla/5.0})    
html = urlopen(req)

Related posts:

Leave a Reply

Your email address will not be published. Required fields are marked *