Python: confusions with urljoin
The best way (for me) to think of this is the first argument,
base is like the page you are on in your browser. The second argument
url is the href of an anchor on that page. The result is the final url to which you will be directed should you click.
>>> urljoin(some, thing) thing
This one makes sense given my description. Though one would hope base includes a scheme and domain.
>>> urljoin(http://some, thing) http://some/thing
If you are on a vhost some, and there is an anchor like
<a href=thing>Foo</a> then the link will take you to
>>> urljoin(http://some/more, thing) http://some/thing
We are on
some/more here, so a relative link of
thing will take us to
>>> urljoin(http://some/more/, thing) # just a tad / after more http://some/more/thing
Here, we arent on
some/more, we are on
some/more/ which is different. Now, our relative link will take us to
>>> urljoin(http://some/more/, /thing) http://some/thing
And lastly. If on
some/more/ and the href is to
/thing, you will be linked to
If url is an absolute URL (that is, starting with //, http://, https://, …), the url’s host name and/or scheme will be present in the
result. For example:
>>> urljoin(https://www.google.com, //www.microsoft.com) https://www.microsoft.com >>>
otherwise, urllib.parse.urljoin(base, url) will
Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base
URL, in particular the addressing scheme, the network location and
(part of) the path, to provide missing components in the relative URL.
>>> urlparse(http://a/b/c/d/e) ParseResult(scheme=http, netloc=a, path=/b/c/d/e, params=, query=, fragment=) >>> urljoin(http://a/b/c/d/e, f) >>>http://a/b/c/d/f >>> urlparse(http://a/b/c/d/e/) ParseResult(scheme=http, netloc=a, path=/b/c/d/e/, params=, query=, fragment=) >>> urljoin(http://a/b/c/d/e/, f) http://a/b/c/d/e/f >>>
it grabs the path of the first parameter (base), strips the part after the last / and joins with the second parameter (url).
If url starts with /, it joins the scheme and netloc of base with url
>>>urljoin(http://a/b/c/d/e, /f) http://a/f