A simple script like the one you can write in the basic Python tutorial called Making requests to the Zendesk API is fine for getting up to two dozen or so records from your Zendesk product. However, to retrieve several hundred or several thousand records, a script has to perform the following tasks:

This article shows you how to write a Python script that can retrieve large data sets with the Zendesk API. To run the examples, you'll need Python 3 and the Requests library.

After getting a large data set from the API, you might want to move it to a Microsoft Excel worksheet to more easily view and analyze the data. To learn how, see Write large data sets to Excel with Python and pandas.

For all the possible data you can retrieve from your Zendesk product, see the "JSON Format" tables of the Support and the Help Center API docs. Most APIs have a "List" endpoint for getting multiple records.

Disclaimer: Zendesk provides this article for instructional purposes only. Zendesk does not support or guarantee the code. Zendesk also can't provide support for third-party technologies such as Python. Please post any issue in the comments section or search for a solution online.

Make the basic request

Suppose you want to download the four thousand posts in a community topic in your Help Center. Start with the basic request. Create a file named list_posts.py and paste the following code in it:

import requests
credentials = '_your_zendesk_email_', '_your_zendesk_password_'session = requests.Session()session.auth = credentialszendesk = '_your_zendesk_url_'
topic_id = _123456_url = zendesk + '/api/v2/community/topics/' + str(topic_id) + '/posts.json'response = session.get(url)if response.status_code != 200:    print('Error with status code {}'.format(response.status_code))    exit()data = response.json()topic_posts = data['posts']
for post in topic_posts:    print(post['title'])

The general logic of the script is explained in Getting data from your Zendesk product in the basic Python tutorial.

Replace the placeholders your\zendesk_email, _your\zendesk_password, and _your\zendesk_url_ with your own values. The Zendesk Support URL should look like 'https://obscura.zendesk.com'. Also replace the value of topic_id with the id of a community topic in your Help Center.

Save the file. In your command line tool, navigate to the folder with the script and run the following command:

$ python3 list_posts.py

The response should return the first 30 posts in the community topic you specified.

Paginate through all the results

For bandwidth reasons, the API doesn't return large record sets all at once. It breaks up the results into smaller subsets and returns them in pages. The posts API returns 30 records per page.

To capture all the records, create a while loop, stash the page data incrementally in a variable, and get the 'next_page' url (in bold):

import requests

credentials = 'your_zendesk_email', 'your_zendesk_password'
session = requests.Session()
session.auth = credentials
zendesk = 'your_zendesk_url'

topic_id = 123456
topic_posts = []
url = zendesk + '/api/v2/community/topics/' + str(topic_id) + '/posts.json'
while url:
    response = session.get(url)
    if response.status_code != 200:
        print('Error with status code {}'.format(response.status_code))
        exit()
    data = response.json()
    topic_posts.extend(data['posts'])
    url = data['next_page']

for post in topic_posts:
    print(post['title'])

For an explanation of the logic, see Adding pagination to your code in the article Paginating through lists.

Guard against the rate limit

If you make a lot of API requests in a short time, such as when paginating through a large data set, you might bump into the Zendesk API rate limit. The API stops processing any more requests until a certain amount of time has passed. For more information, see Rate Limiting in the API docs.

When you reach the rate limit, the API responds with a HTTP 429 Too Many Requests response code. The response has a Retry-After header that tells you how many seconds to wait before retrying.

Update the script as follows (in bold) to check for a 429 status code and wait if it's detected:

import time
import requests

credentials = 'your_zendesk_email', 'your_zendesk_password'
session = requests.Session()
session.auth = credentials
zendesk = 'your_zendesk_url'

topic_id = 123456
topic_posts = []
url = zendesk + '/api/v2/community/topics/' + str(topic_id) + '/posts.json'
while url:
    response = session.get(url)
    if response.status_code == 429:
        print('Rate limited! Please wait.')
        time.sleep(int(response.headers['retry-after']))
        continue
    if response.status_code != 200:
        print('Error with status code {}'.format(response.status_code))
        exit()
    data = response.json()
    topic_posts.extend(data['posts'])
    url = data['next_page']

for post in topic_posts:
    print(post['title'])

For more information, see Best practices for avoiding rate limiting.

Suppose you want to display the author of each community post. The records returned by the posts API identify authors only by their Zendesk Support user id, not by their actual names. Example: "author_id": 21436587.

You could call the users API to get the name associated with each user id. However, this means calling the API for each post in your data set, potentially amounting to thousands of API calls.

A more efficient solution is to sideload the user records with the post records. Sideloading gets both recordsets in a single request. For more information, see Sideloading related records.

Update the script as follows (in bold) to sideload the users who authored the posts. Make sure to scroll horizontally to see the modified url variable.

import time
import requests

credentials = 'your_zendesk_email', 'your_zendesk_password'
session = requests.Session()
session.auth = credentials
zendesk = 'your_zendesk_url'

topic_id = 123456
topic_posts = []
user_list = []
url = zendesk + '/api/v2/community/topics/' + str(topic_id) + '/posts.json?include=users'
while url:
    response = session.get(url)
    if response.status_code == 429:
        print('Rate limited! Please wait.')
        time.sleep(int(response.headers['retry-after']))
        continue
    if response.status_code != 200:
        print('Error with status code {}'.format(response.status_code))
        exit()
    data = response.json()
    topic_posts.extend(data['posts'])
    user_list.extend(data['users'])
    url = data['next_page']

for post in topic_posts:
    author = 'anonymous'
    for user in user_list:
        if user['id'] == post['author_id']:
            author = user['name']
            break
    print('"{}" by {}'.format(post['title'], author))

For each post, the script loops through the list of user records looking for a matching author_id value. When it finds a match, the script assigns the associated user name to the author variable and breaks out of the loop. The author's name is then printed with the post title.

Serialize the data to reuse it

Suppose you're developing the script and you need to make repeated API requests to test and debug it. This is wasteful when you're dealing with a large data set requiring hundreds if not thousands of requests to get all the data. Instead, you could make just one call, serialize the results, and then reuse the serialized data as many times as you want.

Serializing a data structure means translating it into a format that can be stored and then reconstructed later in the same environment. In Python, you can use the built-in pickle module to serialize and deserialize a data structure.

Update the script as follows (in bold) to serialize all the post and user data:

import pickle
import time
import requests

credentials = 'your_zendesk_email', 'your_zendesk_password'
session = requests.Session()
session.auth = credentials
zendesk = 'your_zendesk_url'

topic_id = 123456
topic_posts = []
user_list = []
url = zendesk + '/api/v2/community/topics/' + str(topic_id) + '/posts.json?include=users'
while url:
    response = session.get(url)
    if response.status_code == 429:
        print('Rate limited! Please wait.')
        time.sleep(int(response.headers['retry-after']))
        continue
    if response.status_code != 200:
        print('Error with status code {}'.format(response.status_code))
        exit()
    data = response.json()
    topic_posts.extend(data['posts'])
    user_list.extend(data['users'])
    url = data['next_page']

topic_data = {'posts': topic_posts, 'users': user_list}
with open('my_serialized_data_file.p', mode='wb') as f:
    pickle.dump(topic_data, f)

The script assigns the user and post data to a new dictionary named topic, which it then serializes into a file named my_serialized_data_file in the current folder.

You can then comment out the rest of the code and deserialize the dictionary as many times as you want to test and format the output:

import pickle

# comment out everything else

with open('my_serialized_data_file.p', mode='rb') as f:
    topic = pickle.load(f)

for post in topic['posts']:
    author = 'anonymous'
    for user in topic['users']:
        if user['id'] == post['author_id']:
            author = user['name']
            break
    print('"{}" by {}'.format(post['title'], author))

You can use the same code snippet to develop other scripts without calling the API.

You now have the tools to update your Python scripts to retrieve large data sets with the API. If you want to move your data to Microsoft Excel to view and analyze it, see Writing large data sets to Excel with Python and pandas.

Code complete

Join the discussion about this article in the community.