Help
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Pilot Lvl 1
Message 1 of 13

'GET ' concatenation

Hey everyone

After hours of searching and trying, I can't get this simple line to work. No matter what I did,I can't seem to be able to concatenate GET with a URL

I'm trying to solve this exercise : 

"Change the socket program socket1.py to prompt the user
for the URL so it can read any web page. You can use split('/') to
break the URL into its component parts so you can extract the host
name for the socket connect call."

 

Here's what I've written :

import socket

user_url = input("Enter url: ")
host_name = user_url.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host_name, 80))
cmd = 'GET' + user_url + 'HTTP/1.0\r\n\r\n'
cmd = cmd.encode()
mysock.send(cmd)

It gives me the '400' bad request error. Any ideas ?

 

 

 

12 Replies
Highlighted
Commander Lvl 3
Message 2 of 13

Re: 'GET ' concatenation

Looks to me like you are missing the necessary space on either side of 'user_url'.

 

I'd add the line 

print(cmd)

to see what you've actually got, I think right now you have (e.g.):

GEThttp://google.com/blahHTTP 1.0

when you need:

GET http://google.com/blah HTTP 1.0

 

Please follow-up to let us know how you made out. For good karma, mark a reply as the answer if it helped!

Highlighted
Pilot Lvl 1
Message 3 of 13

Re: 'GET ' concatenation

Thanks for your reply

OK so I added a space after GET ( 'GET ') and also before HTTP (' HTTP') 

import socket

user_url = input("Enter url: ")
host_name = user_url.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host_name, 80))
cmd = 'GET ' + user_url + ' HTTP/1.0\r\n\r\n'
cmd = cmd.encode()
print(cmd)
mysock.send(cmd)

while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode(),end='')

mysock.close()

And withprint(cmd), I get back this :

Enter url: https://www.py4e.com/code3/romeo.txt
b'GET https://www.py4e.com/code3/romeo.txt HTTP/1.0\r\n\r\n'
HTTP/1.1 400 Bad Request
Server: httppd
Mime-Version: 1.0
Date: Wed, 16 Jan 2019 11:21:59 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 1285
Connection: close

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ERROR: The requested URL could not be retrieved</title>
<style type="text/css"><!--
body {
margin: 0;
padding: 0;
background: #efefef;
font-family: verdana, sans-serif;
font-size: 12px;
color: #1e1e1e;
}
#titles {
padding-left: 10px;
}
h1, h2 {
color: #000000;
}
hr {
height: 1px;
border: none;
background: #808080;
margin: 0;
}
#content {
padding: 10px;
background: #ffffff;
}
pre {
font-family: sans-serif;
}
#footer {
font-size: 9px;
padding-left: 10px;
}

--></style>
</head><body>
<div id="titles">
<h1>ERROR</h1>
<h2>The requested URL could not be retrieved</h2>
</div>
<hr>
<div id="content">
<p>The following error was encountered while trying to retrieve the URL: <a href="https://www.py4e.com/code3/romeo.txt">https://www.py4e.com/code3/romeo.txt</a></p>
<blockquote id="error">
<p><b>Unsupported Request Method and Protocol</b></p>
</blockquote>
<p>Cache does not support all request methods for all access protocols. For example, you can not POST a Gopher request.</p>
<p>Your cache administrator is <a href="mailto:webmaster">webmaster</a>.</p>
<br>
</div>
<hr>
<div id="footer">
<p>Generated Wed, 16 Jan 2019 11:21:59 GMT</p>
</div>
</body></html>

 

 

Highlighted
Pilot Lvl 1
Message 4 of 13

Re: 'GET ' concatenation

I think I figured it out. The reason it didn't work was every link I posted was https instead of http. It works fine with all links starting with http

 

Highlighted
Commander Lvl 3
Message 5 of 13

Re: 'GET ' concatenation

Glad you fixed the problem and thanks for sharing your solution with the community!

 

@fire-eggs thanks for the help!


- Mark
Highlighted
Copilot Lvl 2
Message 6 of 13

Re: 'GET ' concatenation

Hi @Maaazi 

 

I am having the same problem you had. I even tried with your code and it failed. My code is:

 

import socket

fhand = input("Enter your url: ")
host = fhand.split("/")[2]
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((host, 80))
cmd = 'GET ' + fhand + ' HTTP/1.0\r\n\r\n'
cmd = cmd.encode()
print(cmd)
mysock.send(cmd)

while True:
data = mysock.recv(512)
if len(data) < 1:
break
print(data.decode(),end='')

mysock.close()

 

and the error I am getting is this:

 

Enter url: http://data.pr4e.org/romeo.txt
b'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'
HTTP/1.1 400 Bad Request
Date: Wed, 01 Apr 2020 22:30:25 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 305
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at data.pr4e.org Port 80</address>
</body></html>

Process finished with exit code 0

 

Highlighted
Pilot Lvl 3
Message 7 of 13

Re: 'GET ' concatenation

For normal GET requests the part after GET should be only the path on the server, not the full URL (the full URL is correct if you're sending a request to a proxy). The host name goes into the "Host" header, in your example the request should look like this:

GET /romeo.txt HTTP/1.0
Host: data.pr4e.org

(With '\r\n' line endings and an extra '\r\n' after the last header.)

Highlighted
Copilot Lvl 2
Message 8 of 13

Re: 'GET ' concatenation

Hi @airtower-luna ,

 

Thanks for your reply. I've modified the code to match what you have said but there is something I am missing. Where should I put the \r\n?  Could you set me an example using my code? Thanks

Highlighted
Pilot Lvl 3
Message 9 of 13

Re: 'GET ' concatenation

I'd recommend using the urllib.parse module (part of the standard library) to parse the URL. It gives you convenient access to all parts of the URL:

import urllib.parse
url = urllib.parse.urlparse(fhand)
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((url.hostname, url.port or 80)

From there you can assemble the HTTP request:

cmd = 'GET ' + url.path + ' HTTP/1.0\r\nHost: ' + url.hostname + '\r\n\r\n'
cmd = cmd.encode()
mysock.send(cmd)

Each line in the request header must be terminated with '\r\n', plus an extra empty line (so another '\r\n') at the end. And a GET request is basically all headers.

 

Try adding a print(cmd, end='') to see what your request looks like in a more readable form!

 

Edit: Added "or 80" to the port specification. url.port may be None if the URL doesn't contain an explicit port.

Highlighted
Copilot Lvl 2
Message 10 of 13

Re: 'GET ' concatenation

Thank you very much for your detailed answer. I don't know what was wrong but now my code is working.