Written 24 July 2023 ~ 12 min read

Recursion and Pagination

Written by

Benjamin Clark

Introduction

An XKCD comic on recursion, using illustrative characters in a tabletop roleplaying game — “Tabletop roleplaying“ (2012). *XKCD*. Available at: https://xkcd.com/244/ (Accessed: 24th July 2023)

This article aims to show how you can use a general-purpose, recursive, algorithm to paginate pretty much any RESTAPI out there. It gives a brief explanation of what both recursion and pagination are, before moving on to the ‘meat and potatoes’ of how you can combine the two to elegantly traverse a RESTAPI.

So, if recursion and pagination are old friends just skip right to the end. If not, then read on.

Recursion

What is recursion, well it’s recursion. What it recursion, well it’s recursion? What is recursion? Etc ad infinitum. Well not quite.

The premise of recursion is simple. Break the problem down into smaller and smaller chunks until you can solve the simplest version of itself, then build back up to calculate the more complex solution knowing what the simpler solutions are. For example, trying to calculate a Fibonacci number without recursion might be done similar to the below:

def fibonacci(number) -> int:
    """
    Calculate fibonacci sequence number for given number
    :param number: What fibonacci number to fibonacci
    :return: The numberth fibonacci sequence number
    """
    if number < 2:
        return number
    else:
        first = 0
        second = 1
        for i in range (3, number):
            third = first + second
            first = second
            second = third
        return third

As you may immediately see, it’s a bit ugly to read and it’s not immediate obvious what’s happening.

Now, to break this down recursively we need to answer two questions:

What can we calculate with absolute finality without knowledge of the rest of the problem. i.e. at what point do we stop recursion? This is known as our “base case”.
Knowing the result for our base case, how do we then move back up the stack?

For Fibonacci, we can answer these questions thus:

Our base case is when we want to calculate a sequence number < 2, as in this instance we can simply return the sequence number itself
Knowing our base case, we can thus just add the result of sequence number -1 and sequence number -2

This results is a more elegant algorithm as per below:

def fibonacci(number) -> int:
    """
    Calculate fibonacci sequence number for given number
    :param number: What fibonacci number to fibonacci
    :return: The numberth fibonacci sequence number
    """
    if number < 2:
        return number
    return fibonacci(number - 1) + fibonacci(number - 2)

This works as, rather than iteratively travelling up the stack to gradually compute the answer we instead traverse down the stack to our base case, then back up whilst gradually computing the answer. Perhaps that’s easier to understand in a visual context:

You can see here what happens:

We work our way down the recursion stack to our base case
Our base case provides a concrete value
We work our way back up the recursion stack, calculating intermediate values
Eventually, we work our way back to the top of the stack at which point whatever value we’ve calculated is our expected result

It’s a space-time tradeoff. Memory/space (in the form of large recursion stacks) being traded for elegant algorithms. For languages with tail recursion elimination the stack usage is minimal; instead we just set pointers in order to jump from subroutine to subroutine instead of popping function/method calls from a stack.

Thus, having answered (albeit at a high level) what recursion is let us move on to pagination.

A Brief note on tail recursion elimination in Python

I don’t want TRE (tail recursion elimination) in the language

Guido Van Rossum

Python natively supports a recursion depth of 1000 and - as per the above - this is a conscious decision by the creator of the language.

So the Fibonnaci example above as just that; an example to explain recursion in an extremely basic way to those who don’t know what recursion is. Don’t try using it to calculate the 100000th sequence number, and if you do it’s not going to work and it just may hurt your laptop.

Pagination

What is pagination?

Pagination divides information into discrete pages, with usually some kind of next/previous symbol to allow navigation. Pagination is particularly prevalent when interacting with RESTAPIs, let us take a look at why.

Firstly, a concrete example is needed. A book. Let us think of a book. You wouldn’t want Terry Prachett’s masterpiece of “The Colour of Magic” to be printed on a single scroll. Imagine trying to read the equivalent of 241 B4 pages printed on a giant scroll! You wouldn’t find a bookshelf that could hold it.

So instead the novel is paginated; the content is split into discrete pages (B4), a font is picked and each page is allocated a maximum number of characters in that font. Then, the user is provided with ways to navigate between the discrete pages as each page has a page number at the bottom. Thus is the basis of pagination:

Content is split into discrete pages
A finite upper size is allocated to each page, allowing each page - in and of itself - to be digestible for interested users
Pages are given symbols to indicate their uniqueness
Users are provided a means to navigate between pages

It follows then that reading is a simple algorithm:

Open the book.

Then, read a page (maybe with a cup of tea to hand).

If there are pages left to read, keep reading.

Once no pages are left place the book on your shiny new IKEA bookshelf.

RESTAPIs and pagination

Given our definition of pagination it’s easy to see how this can be applied to RESTAPIs. If we’re providing basic CRUD functionality via our API then it follows that the database behind it may have many more records available than we can reasonably expect a user to deal with in one go.

Hence, pagination has become the norm for RESTAPIs with sufficiently complex/large datasets.

Lets see an example. I’ve picked OpenBreweryDB to demonstrate as:

It doesn’t need an API key
It was the first interesting, free, RESTAPI with pagination I could find in a short space of time

Earlier we stated that a finite upper limit is given to each page if content is paginated. By convention most RESTAPIs use ?per_page as a query parameter but it may vary… check the appropriate RESTAPI documentation.

So, if I want to get a page of breweries - with 5 breweries per page - I can do so:

curl --location --request GET 'https://api.openbrewerydb.org/v1/breweries?per_page=5'

But this only returns 5, how about the other pages? Some APIs - like Bitbucket - include a next symbol in their return headers. Some do not. Again, read your documentation. The breweries API does not so instead we can iterate over pages until there’s no more content left.

So, for example, the below will query the RESTAPI for pages of closed breweries - with 20 breweries a page - until no more breweries are returned:

# Standard modules
from typing import List
# Third-party modules
from requests import Session
base_url = "https://api.openbrewerydb.org/v1/breweries?by_type=closed&per_page=20"
query_session = Session()
def query_api(url: str, session: Session) -> List[dict]:
    """
    Basic function to query a given RESTAPI based on assumptions that:
    - API returns a list of dictionaries
    - We must explicitly paginate via page numbers
    :param url: Base API url
    :param session: Session with appropriate headers etc to use for the query
    :return: List of dictionaries containing all paginated content
    """
    return_list = []
    page = 1
    no_more_pages = False
    while not no_more_pages:
        query_url = f"{url}&page={page}"
        base_request = session.get(query_url)
        if not base_request.ok or len(base_request.json()) == 0:
            no_more_pages = True
        else:
            for result in base_request.json():
                return_list.append(result)
        page += 1
    return return_list
print(len(query_api(url=base_url, session=query_session)))

Which shows that there are 163 closed breweries which the API knows about. Very sad, but - given we’re only returning 20 a page - shows that our pagination works. We can split this down into our pagination criteria of earlier as well:

Breweries are split into pages.
Each page has a finite limit of 20 breweries.
Each page has a number to symbolise uniqueness.
Our means of navigation, in this instance, is to keep turning the page until no content remains.

A recursive refactor

Bringing together pagination and recursion, which is the point of this article, we again need to refactor the algorithm itself to be recursive by defining our base case and how to gradually calculate the full answer. This can be done as follows:

Our base case is that we’ve queried the API and not recieved an OK response, or the response we’ve got is empty.
To find the total, add the content of the current page to the next page

This results in a simple algorithm and call as follows:

def query_api(url: str, page: int, session: Session) -> List[dict]:
    """
    Basic recursive function to query a given RESTAPI based on assumptions that:
    - API returns a list of dictionaries
    - We must explicitly paginate via page numbers
    :param url: Base API url
    :param page: Page we are querying
    :param session: Session with appropriate headers etc to use for the query
    :return: List of dictionaries containing all paginated content
    """
    base_request = session.get(url=f"{url}&page={page}")
    if not base_request.ok or len(base_request.json()) == 0:
        return []
    return base_request.json() + query_api(url=url, page=page + 1, session=session)
print(len(query_api(url=base_url, page=1, session=query_session)))

Which work for any RESTAPI provided:

We just require the base list of dictionaries usually returned by response.json()
Pagination is explicitly via the page parameter and not otherwise available

Bringing it together

The brewery RESTAPI used in our example is basic, and not representative of a majority of enterprise RESTAPIs you’ll come across. Most RESTAPIs that this author has come across follow Google’s best practices document, which specify that pagination should be via a links header with next and previous keys which fulfils our pagination definitions of:

Each page has a number to symbolise uniqueness.
Our means of navigation, in this instance, is to keep turning the page until no content remains.

Furthermore, most RESTAPIs you use in your day-to-day will probably have horrible nested dictionaries, or lists, or lists of dictionaries (I’m looking at you AWS) for which the above algorithm is not suitable.

Thankfully, we just need a few tweaks to change our algorithm to work in this manner:

# Standard modules
from typing import List
from typing import Union
# Third party modules
import requests
def paginate_response(response: requests.models.Response, next_page: str, base_field: Union[str, None],
                      client: requests.Session) -> List[dict]:
    """
    Recursive method to traverse a paginated response from a RESTAPI and returns a list of all responses we're
    interested in.
    :param client: Session object originally used to query RESTAPI, needed for persistent headers etc.
    :param response: A Requests response from a RESTAPI.
    :param next_page: The key for the next page in the response.
    :param base_field: The key which corresponds to the list containing the pertinent values.
    :return: A list of dictionaries.
    """
    response_dict = response.json()
    current_page = response_dict if base_field is None else response_dict[base_field]
    if next_page not in response_dict:
        return current_page
    else:
        return current_page + paginate_response(
            response=client.get(url=response_dict[next_page]), next_page=next_page, base_field=base_field,
            client=client)

Which will work for many things, for which I’ll give examples below. Of course it goes without saying that such instructional material should have logging and error handling added before you use it in anger yourselves.

Example usages

Bitbucket RESTAPI

Using our recursive algorithm, querying a list of all repositories against a Bitbucket organisation becomes a breeze:

# Bitbucket workspace
workspace = "<ORG>"
base_url = "https://api.bitbucket.org/2.0"
client = requests.Session()
# Bitbucket RESTAPI Token
headers = {
    "Authorization": "Bearer <TOKEN>",
    "Accept": "application/json"
}
client.headers.update(headers)
raw_repositories = paginate_response(
    response=client.get(url=f"{base_url}/repositories/{workspace}"),
    next_page="next",
    base_field="value",
    client=client
)

GitHub RESTAPI

Using our recursive algorithm, querying all repositories for the current user in GitHub is also a breeze:

base_url = "https://api.github.com/user/repos"
client = requests.Session()
# GitHub RESTAPI Token
headers = {
    "Authorization": "Bearer <TOKEN>>",
    "Accept": "application/vnd.github+json",
    "X-GitHub-Api-Version": "2022-11-28"
}
client.headers.update(headers)
raw_repositories = paginate_response(
    response=client.get(url=base_url),
    next_page="next",
    base_field=None,
    client=client
)

Dungeons and Dragons 5E

As the base case is that next page is not in the header, the function is also usable for APIs which don’t use pagination at all. For example, we can get all spells in 5th edition Dungeons and Dragons easily as well:

base_url = "https://www.dnd5eapi.co/api/spells"
client = requests.Session()
headers = {
    "Accept": "application/json"
}
client.headers.update(headers)
raw_spells = paginate_response(
    response=client.get(url=base_url),
    next_page="next",
    base_field=None,
    client=client
)

Conclusion

I hope you’ve found this an enjoyable little read, and learnt something new. Remember though, you should aways:

Lint your code
Make unit tests
Add error handling
Absolute, under no circumstances, download code verbatim from the internet and just throw it in production
- Doubly so if you don’t understand what it does

With that said, I’ll hopefully see you soon!

Part 1 of 3 in Software Odds and Ends