I have a csv that contains a lot of data. When I launch a webscrapping, I receive a:
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
In order to limit the amount of data to be processed for webscrapping, I would like to divide the following script into several scripts, each browsing intervals of the csv file:
# Get the data from the csv containing pmid list by author : with open("D:/Nancy/Pèse-Savants/Excercice Covid-19/Exercice 3/pmid_par_auteur.csv",'r', encoding='utf-8') as f: # Sseperate author's list from pmid's list into 2 columns : with open ("pmid_par_auteur_uniformise.csv", "w", encoding='utf-8') as fu: csv_f = csv.reader(f, delimiter = ';') for ligne in csv_f: fu.write(ligne + '\n') auteur_pmid_doi =  # Clean up encoded data in 'utf-8' with open("pmid_par_auteur_uniformise.csv",encoding='utf-8') as fu: csv_fu = csv.reader(fu) for ligne in csv_fu: ligne = ligne.replace("'", " ") ligne = ligne.replace("[", " ") ligne = ligne.replace("]", " ") ligne = ligne.split(" , ") # Get DOI for each pmid for each author that wrote on Covid-19 pmid_doi =  for pmid in ligne: try : handle = Entrez.esummary(db="pubmed", id=pmid) record = Entrez.read(handle) record = record['DOI'] except IndexError : print ('Missing DOI') except KeyError : print ('Missing DOI') else : pmid_doi.append([pmid, record]) #handles are a finite resource, I close it in order to avoid exhausting the handle supply with a large dataset. handle.close() # Delete temporary variables to free some space in the RAM: auteur_pmid_doi.append([ligne, pmid_doi]) del (ligne) del (handle) del (record) del (pmid_doi) auteur_pmid_doi
Each script would run through a data interval ranging like this: - From the first line starting with letter A to the last line starting with the letter E. - From the first line starting with letter F, to the last line starting with the letter J. - From the first line starting with the letter k to the last line starting with the letter O. - And so on up to “z”.
How do you browse the lines of a csv through these types of intervals?
I add the link to my csv and thank you in advance for your help.