'GET ' concatenation

Hey everyone

After hours of searching and trying, I can’t get this simple line to work. No matter what I did,I can’t seem to be able to concatenate GET with a URL

I’m trying to solve this exercise : 

“Change the socket program socket1.py to prompt the user
for the URL so it can read any web page. You can use split(’/’) to
break the URL into its component parts so you can extract the host
name for the socket connect call.”

Here’s what I’ve written :

import socket

user_url = input("Enter url: ")
host_name = user_url.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host_name, 80))
cmd = 'GET' + user_url + 'HTTP/1.0\r\n\r\n'
cmd = cmd.encode()
mysock.send(cmd)

It gives me the ‘400’ bad request error. Any ideas ?

Looks to me like you are missing the necessary space on either side of ‘user_url’.

I’d add the line 

print(cmd)

to see what you’ve actually got, I think right now you have (e.g.):

GEThttp://google.com/blahHTTP 1.0

when you need:

GET http://google.com/blah HTTP 1.0
2 Likes

Thanks for your reply

OK so I added a space after GET ( ‘GET ‘) and also before HTTP (’ HTTP’) 

import socket

user_url = input("Enter url: ")
host_name = user_url.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host_name, 80))
cmd = 'GET ' + user_url + ' HTTP/1.0\r\n\r\n'
cmd = cmd.encode()
print(cmd)
mysock.send(cmd)

while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode(),end='')

mysock.close()

And withprint(cmd), I get back this :

Enter url: https://www.py4e.com/code3/romeo.txt
b'GET https://www.py4e.com/code3/romeo.txt HTTP/1.0\r\n\r\n'
HTTP/1.1 400 Bad Request
Server: httppd
Mime-Version: 1.0
Date: Wed, 16 Jan 2019 11:21:59 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 1285
Connection: close

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ERROR: The requested URL could not be retrieved</title>
<style type="text/css"><!--
body {
margin: 0;
padding: 0;
background: #efefef;
font-family: verdana, sans-serif;
font-size: 12px;
color: #1e1e1e;
}
#titles {
padding-left: 10px;
}
h1, h2 {
color: #000000;
}
hr {
height: 1px;
border: none;
background: #808080;
margin: 0;
}
#content {
padding: 10px;
background: #ffffff;
}
pre {
font-family: sans-serif;
}
#footer {
font-size: 9px;
padding-left: 10px;
}

--></style>
</head><body>
<div id="titles">
<h1>ERROR</h1>
<h2>The requested URL could not be retrieved</h2>
</div>
<hr>
<div id="content">
<p>The following error was encountered while trying to retrieve the URL: <a href="https://www.py4e.com/code3/romeo.txt">https://www.py4e.com/code3/romeo.txt</a></p>
<blockquote id="error">
<p><b>Unsupported Request Method and Protocol</b></p>
</blockquote>
<p>Cache does not support all request methods for all access protocols. For example, you can not POST a Gopher request.</p>
<p>Your cache administrator is <a href="mailto:webmaster">webmaster</a>.</p>

</div>
<hr>
<div id="footer">
<p>Generated Wed, 16 Jan 2019 11:21:59 GMT</p>
</div>
</body></html>

I think I figured it out. The reason it didn’t work was every link I posted was https instead of http. It works fine with all links starting with http

1 Like

Glad you fixed the problem and thanks for sharing your solution with the community!

@fire-eggs thanks for the help!

what is this ‘b’ thing you get with the print(cmd) command?

b'GET https://www.py4e.com/code3/romeo.txt HTTP/1.0\r\n\r\n'

Hi @kfarrukh, see https://stackoverflow.com/a/6269785

Hi @maaazi 

I am having the same problem you had. I even tried with your code and it failed. My code is:

import socket

fhand =input("Enter your url: ")
host = fhand.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host, 80))
cmd ='GET ' + fhand +' HTTP/1.0\r\n\r\n'  
cmd = cmd.encode()
print(cmd)
mysock.send(cmd)

while True:
data = mysock.recv(512)
if len(data) <1:
break  
print(data.decode(),end='')

mysock.close()

and the error I am getting is this:

Enter url: http://data.pr4e.org/romeo.txt
b’GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n’
HTTP/1.1 400 Bad Request
Date: Wed, 01 Apr 2020 22:30:25 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 305
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”>
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at data.pr4e.org Port 80</address>
</body></html>

Process finished with exit code 0

For normal GET requests the part after GET should be only the path on the server, not the full URL (the full URL is correct if you’re sending a request to a proxy). The host name goes into the “Host” header, in your example the request should look like this:

GET /romeo.txt HTTP/1.0
Host: data.pr4e.org

(With ‘\r\n’ line endings and an extra ‘\r\n’ after the last header.)

Hi @airtower-luna ,

Thanks for your reply. I’ve modified the code to match what you have said but there is something I am missing. Where should I put the \r\n?  Could you set me an example using my code? Thanks

I’d recommend using the urllib.parse module (part of the standard library) to parse the URL. It gives you convenient access to all parts of the URL:

import urllib.parse
url = urllib.parse.urlparse(fhand)
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((url.hostname, url.portor 80)

From there you can assemble the HTTP request:

cmd = 'GET ' + url.path + ' HTTP/1.0\r\nHost: ' + url.hostname + '\r\n\r\n'
cmd = cmd.encode()
mysock.send(cmd)

Each line in the request header must be terminated with ‘\r\n’, plus an extra empty line (so another ‘\r\n’) at the end. And a GET request is basically all headers.

Try adding a print(cmd, end=’’) to see what your request looks like in a more readable form!

Edit: Added “or 80” to the port specification. url.port may be None if the URL doesn’t contain an explicit port.

1 Like

Thank you very much for your detailed answer. I don’t know what was wrong but now my code is working.

try this

cmd =‘GET’+’ ‘+url1+’ '+ ‘HTTP/1.0\r\n\r\n’
cmd = cmd.encode()
mysock.send(cmd)