After hours of searching and trying, I can’t get this simple line to work. No matter what I did,I can’t seem to be able to concatenate GET with a URL
I’m trying to solve this exercise :
“Change the socket program socket1.py to prompt the user
for the URL so it can read any web page. You can use split(’/’) to
break the URL into its component parts so you can extract the host
name for the socket connect call.”
I am having the same problem you had. I even tried with your code and it failed. My code is:
import socket
fhand =input("Enter your url: ")
host = fhand.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host, 80))
cmd ='GET ' + fhand +' HTTP/1.0\r\n\r\n'
cmd = cmd.encode()
print(cmd)
mysock.send(cmd)
while True:
data = mysock.recv(512)
if len(data) <1:
break
print(data.decode(),end='')
mysock.close()
and the error I am getting is this:
Enter url: http://data.pr4e.org/romeo.txt
b’GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n’
HTTP/1.1 400 Bad Request
Date: Wed, 01 Apr 2020 22:30:25 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 305
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”>
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at data.pr4e.org Port 80</address>
</body></html>
For normal GET requests the part after GET should be only the path on the server, not the full URL (the full URL is correct if you’re sending a request to a proxy). The host name goes into the “Host” header, in your example the request should look like this:
GET /romeo.txt HTTP/1.0
Host: data.pr4e.org
(With ‘\r\n’ line endings and an extra ‘\r\n’ after the last header.)
Thanks for your reply. I’ve modified the code to match what you have said but there is something I am missing. Where should I put the \r\n? Could you set me an example using my code? Thanks
Each line in the request header must be terminated with ‘\r\n’, plus an extra empty line (so another ‘\r\n’) at the end. And a GET request is basically all headers.
Try adding a print(cmd, end=’’) to see what your request looks like in a more readable form!
Edit: Added “or 80” to the port specification. url.port may be None if the URL doesn’t contain an explicit port.