First of all I’m going to show you what an aeronautical chart looks like:
I have approximately 60 airports to keep track of and each airport has between 10 and 15 different charts. Moreover, charts get modified very often almost on a monthly basis.
airports = ['LECH', 'LERI', 'LEPP', 'LEMH', 'LELL', 'LEIB',
'LEPA_LESJ', 'LELC', 'LEGE', 'LEAL', 'LESB', 'LEPO',
'LERS', 'LXGB', 'LEAM', 'LEZG', 'LEMD', 'LEBB', 'LEXJ', 'LEBA',
'LEVX', 'LEZL', 'LEBG', 'LESA', 'LETO', 'LELN', 'LEAS', 'LEVD',
'LEGT', 'LERJ', 'LESO', 'LEVT', 'LECO', 'LEST', 'LEMO', 'LEGA',
'LEDA', 'LESU', 'LEJR', 'LEMI', 'LETL', 'GCLP', 'GCLA', 'GCXO',
'GCRR', 'GSVO', 'GSAI', 'GCTS', 'GCFV', 'GCHI', 'GEML', 'LEMG',
'LEGR', 'LEHC', 'LEBZ', 'LEBL', 'LEAB', 'LECU_LEVS', 'GCGM', 'LEVC',
'LERT', 'LERL', 'LEAG', 'LEAO', 'GEHM', 'GECE', 'LECV', 'LEEC',
'LETA', 'LELO', 'GCXM', 'LEBT']
Imagine that you have to fly to 3 distinct airports, you have to download one by one all the aeronautical charts. Two months later you have to fly the same route, but you don’t know if charts have changed, therefore you download them all again.
I have always found this task repetitive and time-consuming and I decided to make a script to automatize the process.
I had two critical requirements to the script:
Downloading the files one by one in a linear process would take forever. In order to speed up the script it had to be multithreaded. Furthermore, it had to show a progress bar for each download and a report at the end of the script.
The script has three main parts: the first one is creating the tree folder structure to store all the pdf files, the second part is to create a list with all the aeronautical charts URLs and the last one is to download them all.
Main folder will be named with the today.datetime and the date of the AIRAC cycle for example: 23_12_2022(01-DEC-22). It will have as many child folders as airports to download.
def create_airport_folders(airports, access_rights, soup):
"""
Creates the folder structure where the flying charts will be saved
:param airports: list with the OACI airport codes
:param access_rights: access rights code
:param soup: bs4 request from aip.enaire.es
:return: 2 level folder structure
"""
path = create_path(soup)
try:
os.mkdir(path, access_rights)
except OSError:
print("\nCreation of the directory %s failed" % path)
else:
print("\nSuccessfully created the directory %s" % path)
for airport in airports:
try:
os.mkdir(path + "/" + airport, access_rights)
except OSError:
print("\nCreation of the directory %s failed" % (path + "/" + airport))
else:
print("\nSuccessfully created the directory %s" % (path + "/" + airport))
Second part of the script creates a list with all the pdf URLs to download using two functions.
Parse_pdf scraps all the AD2 and AD3 pdf aeronautical charts from the main_url and returns a list.
def parse_pdf(soup):
"""
Extract AD2 and AD3 PDF names from html request
:param soup: bs4 request from aip.enaire.es
:return: List of pdf names
"""
pdf = []
for a in soup.find_all('a', href=True):
pdf.append(a['href'])
pdf_ad2 = list(filter(lambda i: ("AD2") in i,
filter(lambda i: "pdf" in i, pdf)))
pdf_ad3 = list(filter(lambda i: ("AD3") in i,
filter(lambda i: "pdf" in i, pdf)))
pdf = pdf_ad2 + pdf_ad3
return pdf
Create_url creates the long URL to download the file
def create_url(url, pdf):
"""
Joins base urls and pdf name to create the pdf base_url
:param url: URL
:param pdf: file_name
:return: URL + PDF
"""
url = ''.join([url, pdf])
return url
# Create list of urls to request
urls = list(map(aip.create_url,
itertools.repeat(url, len(aip.parse_pdf(soup))),
aip.parse_pdf(soup)))
This is where the magic starts. Once we have the folder structure and the url list the scrip starts downloading all the pdf files with multithreading, and it shows a progress bar for each file.
Basically it executes this function each time a thread is available:
def download_file(url, path, file_name):
"""
Download PDF air chart from URL and saves it in the OACI airport folder
:param url: pdf URL
:param path: Parent folder path
:param file_name: pdf name
:return: pdf file
"""
try:
# Request
html = requests.get(url, stream=True) # Stream to get data in chunks for tqdm
if html.status_code != 200:
print('\nFailure Message {}'.format(html.text))
# OACI code folder
folder = re.findall(r'.*\/(.*)\/.*', url)[0]
# Save pdf to oaci airport folder
with open(path + "/" + folder + "/" + file_name, 'wb+') as f:
# Progress bar init
pbar = tqdm(unit="B",
unit_scale=True,
unit_divisor=1024,
colour="red",
total=int(html.headers['Content-Length']))
pbar.clear()
# Pbar description
pbar.set_description("Downloading {}".format(file_name))
for chunk in html.iter_content(chunk_size=1024):
if chunk:
pbar.update(len(chunk))
f.write(chunk)
pbar.close()
except requests.exceptions.RequestException as e:
print(e)
Now its play time:
threads = []
# Counters
downloads = 0
exceptions = 0
# Time counter init
t1 = time.perf_counter()
with ThreadPoolExecutor(max_workers=20) as executor:
for url in urls:
file = aip.file_name(url)
threads.append(executor.submit(aip.download_file,
url,
aip.create_path(soup),
file))
# Counter of task completed and exceptions
for task in as_completed(threads):
if task.done():
downloads += 1
if task.exception():
exceptions += 1
# Time counter end
t2 = time.perf_counter()
At the end of the script it prints a report:
# Print results
print('\n{} flying charts downloaded in {} seconds with {} exceptions.'.format(downloads,
round((t2-t1),0),
exceptions))
Execute main.py and Voilà:
python main.py
1199 flying charts downloaded in 447.0 seconds with 0 exceptions.
Fly safe!
My name is Martin, and I am a Data Scientist.
Repository link on Github.