Python Basics: Batteries included! 🔋🔋🔋

Tim Fischer

2022-04-13

Slide Centering and Scaling Guides

┏╾╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╼┳╾╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╼┓
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                     ╭╴┆╶╮                                     ┆
┣╾╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╼╋╾╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╼┫
┆                                     ╰╴┆╶╯                                     ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┆                                       ┆                                       ┆
┗╾╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╼┻╾╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╼┛

This slide if for centering and scaling your terminal appropriately when using lookatme. This entire presentation is built to fit this exact scaling. Please ensure that the entire box above is visible without line wrapping. And that this block quote isn’t visible.

Batteries included?

The Python source distribution has long maintained the philosophy of “batteries included” – having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.

– PEP 206 by A.M. Kuchling

Where?!

HERE: https://docs.python.org/3/library/index.html

Todays plan

What happens when?

When? What?
13:15 - 14:30 Intro & Acquisition
14:30 - 14:45 break
14:45 - 16:00 Exploration (& CLI?)

Todays plan

Structure of a what

  1. Short Intro
  2. Do some exploratory exercises:
    1. Exercise description (≈1-2 min)
    2. Reading some documentation (≈5 min)
    3. Thinking/doing the exercise (≈5 min)
    4. Discuss solutions (≈5 min)

Project of the day: What is the most unique Starwars Movie?

Let’s use SWAPI, the starwars API!

https://swapi.dev

Exercise 1: Downloading things…

Goals

Batteries:

Exercise 1: Downloading things…

Template

def download_json(url):
    """Download a given url and parse its content as json.

    :param url: the url of the resource
    :type url: str

    :return: downloaded and parsed json
    :rtype: Any
    """

Exercise 1: Downloading things…

Solution

def download_json(url):
    resp = urlopen(url)
    body = resp.read()
    return json.loads(body)

Exercise 2: Downloading complicated things…

Goals

Batteries

Exercise 2: Downloading complicated things…

Template

def download_chain(start):
    """Download all links in a given chain.

    :param start: first link in the chain
    :type start: str

    :return: list of all results in a chain
    :rtype: Any
    """

Exercise 2: Downloading complicated things…

Solution

def download_chain(start):
    res = []
    url = start
    while url is not None:
        data = download_json(url)
        res.extend(data["results"])
        url = data["next"]
    return res

Exercise 3: Downloading complicated things… (but more generic)

Goals

Batteries

Exercise 3: Downloading complicated things… (but more generic)

Template

def download_chain(chain_link, get_result, start):
    """Download all links in a given chain.

    :param chain_link: function to determin the next link in
                       a chain
    :type chain_link: Callable[[Any], str]

    :param chain_link: function to extract a list of relevant
                       data from each link
    :type chain_link: Callable[[Any], list[T]]

    :param start: first link in the chain
    :type start: str

    :return: list of all results in a chain
    :rtype: list[T]
    """

Exercise 3: Downloading complicated things… (but more generic)

Solution

def download_chain(chain_link, get_result, start):
    res = []
    start = url
    while url is not None:
        data = download_json(url)
        res.extend(get_result(data))  # ← NEW! ✨
        url = chain_link(data)        # ← NEW! ✨
    return res

Exercise 4: URLs are ugly data…

Goals

Batteries

Exercise 4: URLs are ugly data…

Template

def url_to_id(string):
    """Extract the ID out of a given string.

    :param string: The string to extract the ID from.
    :type string: str

    :return: Either the extracted ID or the original string
    :rtype: str
    """

Exercise 4: URLs are ugly data…

Solution

def url_to_id(string):
    if (m := re.match(
        r"^https://swapi\.dev/api/[^/]+/(\d+)+/$",
        string
    )):
        return m.groups()[0]
    else:
        return string

Exercise 5: Cleaning up our data

Goals

Batteries

Exercise 5: Cleaning up our data

Template

def clean_data(data):
    """Clean up and transform a given data set.

    :param data: Data set to clean and transform
    :type data: list[Any]

    :return: Cleaned and transformed data set.
    :rtype: dict[str, Any]
    """

Exercise 5: Cleaning up our data

Solution

def clean_data(data):
    new = {}
    for obj in objs:
        for key, value in list(obj.items()):
            if isinstance(value, list):  #
                obj[key] = [
                    url_to_id(elem)
                    for elem in value
                ]
            elif key == "url":
                del obj["url"]
                obj["id"] = url_to_id(value)
        new[obj["id"]] = obj
    return new

Exercise 6: Doing things in parallel

Goals

Batteries

** ⚠ This is – a lot – more difficult! **

Exercise 6: Doing things in parallel

Template

def do_parallel(task, args):
    """Do a single task for different inputs in parallel.

    :param task: function representing the task to do
    :type task: Callable[[Args], T]

    :param args: list of all argument sets to run
    :type args: list[Args]

    :return: dictionary of task results
    :rtype: dict[Args, T]
    """

Exercise 6: Doing things in parallel

Solution

def do_parallel(task, args):
    res = {}
    with ThreadPoolExecutor() as pool:
        futs = {
            pool.submit(task, arg): arg
            for arg in args
        }
        for fut in as_completed(futs):
            res[futs[fut]] = fut.result()
    return res

╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴

Interlude: the dreaded GIL…

What?

GIL: The Global Interpreter Lock

What??

Only one thread per process can use the python interpreter at a time!

What???

Threads in python are good at doing literally nothing, i.e. blocking I/O.

╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴

Exercise 7: Loading the entire data set

Goals

Batteries

Exercise 7: Loading the entire data set

Template

def download_starwars(root):
    """Download and clean the entire starwars data set.

    :param root: The APIs root url
    :type root: str

    :return: The starwars data set.
    :rtype: Any
    """

Exercise 7: Loading the entire data set

Solution

def download_starwars(root="https://swapi.dev/api"):
    endpoints = download_json(root)
    data = do_parallel(
        partial(
            download_chain,
            itemgetter("next"),
            itemgetter("results")
        ),
        endpoints.values()
    )
    data = {
        name: clean_data(data[endpoint])
        for name, endpoint in endpoints.items()
    }
    return data

╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴

Interlude: Mutable Default Arguments…

What?

Mutable default arguments do not “reset” after a call.

def test(arg=[]):
    arg.append(1)
    print(arg)

test()  # → [1]
test()  # → [1,1]

What??

Most common fix:

def test(arg=None)   # ← NEW! ✨
    arg = arg or []  # ← NEW! ✨
    arg.append(1)
    print(arg)

╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴

Exercise 8: Loading the entire data set (but less annoying for swapi.dev)

Goals

Batteries

Exercise 8: Loading the entire data set (but less annoying for swapi.dev)

Template

def load_starwars(path, root):
    """Download and clean the entire starwars data set.

    :param path: File path to JSON file.
    :type path: str

    :param root: The APIs root url
    :type root: str

    :return: The starwars data set.
    :rtype: Any
    """

Exercise 8: Loading the entire data set (but less annoying for swapi.dev)

Solution

def load_starwars(path="starwars.json", root="https://swapi.dev/api"):
    if Path(path).is_file():
        with open(path, "r") as f:
            return json.load(f)
    else:
        data = download_starwars(root)
        with open(path, "w+") as f:
            json.dump(data, f)
        return data

⏰ Break time!

                          ▄▄▄▄▄▄▄ ▄▄    ▄ ▄▄▄▄▄ ▄▄▄▄▄▄▄  
                          █ ▄▄▄ █ ▄  ▄▄█▄  ▀█ ▄ █ ▄▄▄ █  
                          █ ███ █ ██▄█ █ █▀▀▀█  █ ███ █  
                          █▄▄▄▄▄█ ▄▀▄ █▀▄ ▄▀█▀█ █▄▄▄▄▄█  
                          ▄▄▄▄  ▄ ▄▀ ▀ ▄▄▀▀███▀▄  ▄▄▄ ▄  
                          ▄▄█▄█▀▄▀▄▀   ▄▀ █ ▄▀█ ███ ▄▄▀  
                           █▄█▀▄▄▀ ▄ █▀██▄█▄▀▄▀▀▀▀▀▄▄ ▀  
                          █▀▄▀██▄ ▀▄█▀▄ █ █▀ ██▄▀█▄ ███  
                          █▀▄██ ▄ ▀ ▄▄▀ ▀▀▀ ▄ █▄▀▀█▄ █   
                          ▄▀▀▄▀ ▄▀██▄▄█ ▀█▄ ▀ ▀▀ █ ▀█▀   
                           ▄▀█▀▀▄▄▄▄▄▄█ █▄▀█▄███▄▄▄▄█    
                          ▄▄▄▄▄▄▄ ▀██▄█▄▄   ▀▄█ ▄ ██▀█▀  
                          █ ▄▄▄ █  ▀▄ ▄▀██▄▄▀ █▄▄▄█▀▄█▄  
                          █ ███ █ █ ▄█▀▄ ▀▀  ▀▀█ ▄▀▀▄ █  
                          █▄▄▄▄▄█ █  ▀  █▄█ ▀██  ▀ █ █   

Exercise 9: Who’s the longest?!

Goals

Batteries

Exercise 9: Who’s the longest?!

Template

def longest_vehicle(data):
    """Get longest vehicle per class.

    :param data: Starwars dataset

    :return: Vehicles counts per class.
    :rtype: Vehicle
    """

⚠ Types are just to emphasize the exercise!

Exercise 9: Who’s the longest?!

Solution

def longest_vehicle(data):
    return max(
        (
            vehicle
            for vehicle in data["vehicles"].values()
            if vehicle["length"] != "unknown"
        ),
        key=lambda p: float(p["length"])
    )

╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴

Interlude: Generators vs. Lists and the Choice of Comprehensions

What?

List-Comprehensions create an entirely new list, i.e. they materialize all results immediately.

[ i*i for i in range(1_000_000_000_000) ]

What??

This can and will lead to problems in big data sets!

What???

Consider the use of Generator(-Comprehension)s for big data sets!

( i*i for i in range(1_000_000_000_000) )

They produce values one after another, considerably shrinking the memory footprint.

╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴✄╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴

Exercise 10: Who’s in what class?

Goals

Batteries

Exercise 10: Who’s in what class?

Template

def vehicles_by_class(data):
    """Get all vehicles grouped by their classes.

    :param data: Starwars dataset

    :return: Grouped vehicles
    :rtype: dict[VehicleClass, list[Vehicle]]
    """

⚠ Types are just to emphasize the exercise!

Exercise 10: Who’s in what class?

Solution

def vehicles_by_class(data):
    res = defaultdict(list)
    for vehicle in data["vehicles"].values():
        res[vehicle["vehicle_class"]].append(vehicle)
    return res

Exercise 11: Who’s the longest, but per class?!

Goals

Batteries

Exercise 11: Who’s the longest, but per class?!

Template

def longest_vehicle_per_class(data):
    """Get biggest vehicle per class.

    :param data: Starwars dataset

    :return: Vehicles counts per class.
    :rtype: dict[VehicleClass, Vehicle]
    """

⚠ Types are just to emphasize the exercise!

Exercise 11: Who’s the longest, but per class?!

Solution

def longest_vehicle_per_class(data):
    res = defaultdict(lambda: {"length": "0"})
    for v in data["vehicles"].values():
        vclass = v["vehicle_class"]
        if (
            v["length"] != "unknown"
            and float(v["length"]) > float(res[vclass]["length"])
        ):
            res[vclass] = v
    return res

Exercise 12: How many are there?

Goals

Batteries

Exercise 12: How many are there?

Template

def vehicles_per_class(data):
    """Count vehicles by their class.

    :param data: Starwars dataset

    :return: Vehicles counts per class.
    :rtype: dict[VehicleClass, int]
    """

⚠ Types are just to emphasize the exercise!

Exercise 12: How many are there?

Solution

def vehicles_per_class(data):
    return Counter(
        v["vehicle_class"]
        for v in data["vehicles"].values()
    )

Exercise 13: Group vehicles by films

Goals

Batteries

Exercise 13: Group vehicles by films

Template

def vehicles_by_film(data):
    """Get all characters by film.

    :param data: Starwars dataset

    :return: Vehicles grouped by films.
    :rtype: dict[str, list[Vehicle]]
    """

⚠ Types are just to emphasize the exercise!

Exercise 13: Group vehicles by films

Solution

def vehicles_by_film(data):
    res = defaultdict(list)
    for film_id, film in data["films"].items():
        episode = (int(film_id) + 2) % 6 + 1
        key = f"Episode {episode}: {film['title']}"
        for vehicle_id in film["vehicles"]:
            res[key].append(data["vehicles"][vehicle_id])
    return res

Exercise 14: Group anything by films

Goals

Batteries

Exercise 14: Group anything by films

Template

def by_film(data, film_field, data_field):
    """Group things by film.

    :param data: Starwars dataset

    :param film_field: Name of "relation" field on a film.
    :type film_field: str

    :param data_field: Name of "data" field in dataset.
    :type data_field: str 

    :return: Things grouped by films.
    :rtype: dict[str, list[Thing]]
    """

⚠ Types are just to emphasize the exercise!

Exercise 14: Group anything by films

Solution

def by_film(data, film_field, data_field=None):
    data_field = data_field or film_field
    res = defaultdict(list)
    for film_id, film in data["films"].items():
        episode = (int(film_id) + 2) % 6 + 1
        key = f"Episode {episode}: {film['title']}"
        for vehicle_id in film[film_field]:
            res[key].append(data[data_field][vehicle_id])
    return res

Exercise 15: How unique is a category in a film?

Goals

Batteries

Exercise 15: How unique is a category in a film?

Template

def uniqueness_per_film(data, film_field, data_field):
    """Get the uniqueness a things in a film by films.

    :param data: Starwars dataset

    :param film_field: Name of "relation" field on a film.
    :type film_field: str

    :param data_field: Name of "data" field in dataset.
    :type data_field: str 

    :return: Uniqueness or things by films.
    :rtype: dict[str, list[Thing]]
    """

⚠ Types are just to emphasize the exercise!

Exercise 15: How unique is a category in a film?

Solution

def uniqueness_per_film(data, film_field, data_field=None):
    data_field = data_field or film_field
    return {
        film: round(mean(
            1 / Decimal(len(thing["films"]))
            for thing in things
        ), 2)
        for film, things in by_film(
            data,
            film_field,
            data_field
        ).items()
    }

Exercise 16: 𝗙𝗜𝗡𝗔𝗟𝗟𝗬! Which film is the “uniquest”?

Goals

Batteries

Exercise 16: 𝗙𝗜𝗡𝗔𝗟𝗟𝗬! Which film is the “uniquest”?

Template

def uniquest_film(data):
    """Get the uniquest film of them all!

    :param data: Starwars dataset

    :return: The name of the uniquest starwars film.
    :rtype: str
    """

Exercise 16: 𝗙𝗜𝗡𝗔𝗟𝗟𝗬! Which film is the “uniquest”?

Solution

def uniquest_film(data):
    inidivdual_uniquenesses = [
        uniqueness_per_film(data, "vehicles"),
        uniqueness_per_film(data, "starships"),
        uniqueness_per_film(data, "planets"),
        uniqueness_per_film(data, "species"),
        uniqueness_per_film(data, "characters", "people"),
    ]
    res = inidivdual_uniquenesses[0]
    for uniquenesses in inidivdual_uniquenesses[1:]:
        for key, value in uniquenesses.items():
            res[key] += value
    return max(
        res.keys(),
        key=res.__getitem__
    )