Python Help - UnicodeDecodeError


I know little about Python and have a script that looks for a file but I keep getting this error on the final part of my inputs. I also am using command line on windows.

Traceback (most recent call last):
File “”, line 36, in <module>
for url in f:
File “C:\Users\xxxx\AppData\Local\Programs\Python\Python38\lib\encodings\”, line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x9d in position 16: character maps to <undefined>

I’ve tried adding UTF-8 but not having any luck and/or not putting it in the right place.  What am I missing?  The file is an excel(xlsx)

Script is below:

import urllib.request
import pandas as pd

xlxsFile = input('Enter Path of Excel File:')
data = pd.read_excel (xlxsFile,sheet_name = "pages_current_urls")
df = pd.DataFrame(data)

df.columns = ['Name','ID','URL']

def downloadAndStoreImage(url,imagePath):
    conn = urllib.request.urlopen(url)
    index = url.find('/wp-content/uploads/')

    if(index > 0):
        url = url[:index] + url[index+27:len(url)-1]

    #filename = 'D:\\Personal Data\\Python\\Projects\\Downloading Image\\'+url
    output = open(imagePath, 'wb') #binary flag needed for Windows
import time
def getIndex(url):
    countt = 0
    for i in df['URL']:
        if (str(i) in url):
            return countt
        countt += 1
    return -1
count = 0
imagePath = input("Enter Output Folder Path :")
inputFile = input("Enter File Containing URL's :")

f = open(inputFile, "r")
urlsCollection = []

for url in f:

        if (url not in urlsCollection):
            name = url.split('/')
            name = name[-1].replace('\n','') # Extracting last value

            index1 = getIndex(url)

            if(index1 != -1):
                numberToAdd = df['ID'][index1]
                name = name.split('.')[0] + '-' + str(numberToAdd) + '.jpg'
            path = imagePath+ "\\" + name

            print('Successfully Downloaded,Image :',count)
        count += 1

Any assistance is appreciated.


I’m a little confused - you say the file is an xlsx file, but the error is coming from “inputFile”, which is a filename you enter to point to a file which contains URLs. Entering an xlsx filename for “inputFile” will fail because an xlsx file is a binary file [zip encoded xml].

You read your xlsx file on line 5 but don’t output the data to disk.

I believe the correct way to specify “utf-8” would be:

 f = open(inputFile, “r”, encoding=“utf-8”)