Backing up your knowledge base with the Help Center API

You can use the Zendesk REST API to make backup copies of all the articles in your knowledge base. The backups can be useful in case you need to check or revert to a previous version of an article.

You can back up a help center with only 34 lines of Python code. You can then restore any number of articles with a second, 27-line script.

Disclaimer: Zendesk provides this article for instructional purposes only. Zendesk does not support or guarantee the code. Zendesk also can't provide support for third-party technologies such as Python.

What you need

You need a text editor and a command-line interface like the command prompt in Windows or the Terminal on the Mac. You'll also need Python 3 and a special library to make HTTP requests.

To set up your development environment:

If you don't already have Python 3, download and install it from http://www.python.org/download/. Python is a powerful but beginner-friendly scripting and programming language with a clear and readable syntax. Visit the Python website to learn more.
If you have Python 3.3 or earlier, download and install pip, a simple tool for installing and managing Python packages. See these instructions.

Note: If you have Python 3.4 or better, you already have pip. Skip ahead.
Use the following pip command in your command-line interface to download and install the Requests library, a library that makes HTTP requests in Python easy:
```
pip3 install requests
```
If you have Python 3.3 or earlier, use pip instead of pip3 on the command line.
Finally, when copying the examples in this tutorial, make sure to indent lines exactly as shown. Indentation matters in Python.

If you're interested in taking a deeper dive into Python after finishing this tutorial, see the following free resources:

Think Python by Allen B. Downey
Dive into Python 3 by Mark Pilgrim

The plan

The goal is to back up all the articles in a specified language in your knowledge base. You want to be able to run the script as many times as you need to back up each language in your knowledge base at different times.

Here are the basic tasks the script must carry out to create the backups:

Download the HTML of the articles from the knowledge base.
Create an HTML file for each article in a folder on your hard drive.
Create a backup log for easy reference later.

Backing up the images in the articles is outside the scope of this article. It might be covered in a future tutorial.

Create the Python file

Create a folder named backups where you want to download the backups.
In a text editor, create a file named make_backup.py and save it in your new backups folder.

In the editor, add the following lines to the file.

import osimport datetimeimport csv
import requests
ZENDESK_API_TOKEN = os.getenv('ZENDESK_API_TOKEN') ZENDESK_USER_EMAIL = '{YOUR_ZENDESK_EMAIL_ADDRESS}'ZENDESK_SUBDOMAIN = '{YOUR_ZENDESK_SUBDOMAIN}'language = 'en'

You start by importing the requests library, a third-party Python library for making HTTP requests. You should have installed it earlier. See What you need.

Before running the script, replace the placeholders {YOUR_ZENDESK_EMAIL_ADDRESS} and {YOUR_ZENDESK_SUBDOMAIN} with actual values. Create a ZENDESK_API_TOKEN environment variable for your API token, ensuring it's kept safe.

The language variable specifies the language of the articles you want to back up. Replace the placeholder values with your own. Example:

language = 'en-US'

See Language codes for supported languages for valid values for language.

Also, make sure to include 'https://' in your Zendesk Support url.

Create folders for the backups

In this section, you tell the script to automatically create a folder in your backups folder to store the backup. The folder will have the following structure to easily organize multiple backups in multiple languages:

/backups  /2015-01-24    /en-US

Import the native os and datetime libraries at the top of the script:

import osimport datetime

Add the following lines after the last line in the script:

date = datetime.date.today()backup_path = os.path.join(str(date), language)if not os.path.exists(backup_path):    os.makedirs(backup_path)

The script gets today's date and uses it along with your language variable to build the new path. When the script runs, the backup_path might be something like 2015-01-24/en-US.

The script then checks the make sure the directory doesn't already exist (in case you ran the script earlier on the same day). If not, it creates the directory.

Your script so far should look like this:

import osimport datetimeimport csv
import requests
ZENDESK_API_TOKEN = os.getenv('ZENDESK_API_TOKEN') ZENDESK_USER_EMAIL = '{YOUR_ZENDESK_EMAIL_ADDRESS}'ZENDESK_SUBDOMAIN = '{YOUR_ZENDESK_SUBDOMAIN}'language = 'en'
date = datetime.date.today()backup_path = os.path.join(str(date), language)if not os.path.exists(backup_path):    os.makedirs(backup_path)

You can test this code. Make sure to specify a locale for the language variable (the credentials don't matter at this point), navigate to your backups folder with your command line, and run the script from the command line as follows:

python3 make_backup.py

A folder is created in the backups folder with the current date and the value of your language variable.

Get all the articles in a language

In this section, you send a request to the Help Center API to get all the articles in the language you specified. You'll use the following endpoint in the Articles API:

GET /api/v2/help_center/{locale}/articles.json

The endpoint is documented in this section of the API reference.

Important: Make sure to indent lines below as shown in the text. Use four spaces per indent.

In the script, create the final endpoint url by adding the following statement after the last line in the script (don't use any line breaks):
```
endpoint = f'{ZENDESK_SUBDOMAIN}/api/v2/help_center/{LANGUAGE.lower()}/articles.json'
```
Before you can use the endpoint in a request, you need to prepend your Zendesk Support url to the string and specify a value for the {locale} placeholder. The statement builds the final url from the Zendesk Support url you specified, the endpoint path in the docs, and the article language you specified. The value of your language variable is inserted (or interpolated) at the {locale} placeholder in the string.

Because some locales listed in the language codes article have uppercase letters while the API expects lowercase letters, the value of the language variable is converted to lowercase to be on the safe side.

Using the example in this tutorial, the final endpoint url would be as follows:

'https://obscura.zendesk.com/api/v2/help_center/en-us/articles.json'
Use the endpoint url to make the HTTP request and save the response from the API.
```
response = requests.get(endpoint, auth=credentials)
```
The statement uses the requests object's get() method with the endpoint variable to make a GET request to the API. The method includes an argument named auth that specifies your authentication credentials.

Check the request for errors and exit if any are found:

if response.status_code != 200:    print('Failed to retrieve articles with error {}'.format(response.status_code))    exit()

According to the API doc, the API returns a status code of 200 if the request is successful. In other words, if the status code is anything other than 200 (if response.status_code != 200), then something went wrong. The script prints an error message and exits.

If no errors are found, decode and assign the response to a variable (no indent):
```
data = response.json()
```
The Zendesk REST API returns data formatted as JSON. The json() method from the requests library decodes the data into a Python dictionary. A dictionary is simply a set of key/value pairs formatted almost identically to JSON. Example dictionary:

{'id': 35436, 'author_id': 88887, 'draft': true}

Consult the Zendesk API docs to figure out how the data dictionary is structured. For example, according to the articles API doc, the JSON returned by the API has the following structure:

You can deduce from the doc that the data dictionary consists of one key named articles. Its value is a list of articles, as indicated by the square brackets. Each item in the list is a dictionary of article properties, as indicated by the curly braces.

Use your new knowledge of the data structure to check the results so far:

for article in data['articles']:if article['body'] is None:    continuetitle = '<h1>' + article['title'] + '</h1>'filename = f'{article["id"]}.html'with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:    f.write(title + '\n' + article['body'])print(f'{article["id"]} copied!')

The snippet iterates through all the articles in your data dictionary and prints the id of each article. This is only temporary code for testing. You could print the article body with article['body'], but scanning that much HTML in your console could be a pain. We'll delete the print statement after we're done testing.

Your script so far should look as follows:

import osimport datetimeimport csv
import requests
ZENDESK_API_TOKEN = os.getenv('ZENDESK_API_TOKEN') ZENDESK_USER_EMAIL = '{YOUR_ZENDESK_EMAIL_ADDRESS}'ZENDESK_SUBDOMAIN = '{YOUR_ZENDESK_SUBDOMAIN}'language = 'en'
date = datetime.date.today()backup_path = os.path.join(str(date), language)if not os.path.exists(backup_path):    os.makedirs(backup_path)
endpoint = f'{ZENDESK_SUBDOMAIN}/api/v2/help_center/{language.lower()}/articles.json'while endpoint:    response = requests.get(endpoint, auth=credentials)    if response.status_code != 200:        print(f'Failed to retrieve articles with error {response.status_code}')        exit()    data = response.json()
    for article in data['articles']:        if article['body'] is None:            continue        title = '<h1>' + article['title'] + '</h1>'        filename = f'{article["id"]}.html'        with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:            f.write(title + '\n' + article['body'])        print(f'{article["id"]} copied!')

Replace all the placeholders with actual values and run the script again from the command line:

python3 make_backup.py

You should get a list of up to 30 article ids confirming that the articles were retrieved successfully. You won't see more than 30 articles even if you have more because the API limits the number to prevent bandwidth and memory issues. In the next section, you change the script to paginate through all the results.

Paginate through the results

In this section, you paginate through the article results to see all the articles. The JSON returned by the endpoint may only contain a maximum of 30 records, but it also contains a next_page property with the endpoint URL of the next page of results, if any. Example:

"next_page": "https://example.zendesk.com/api/v2/en-US/articles.json?page=2",...

If there's no next page, the value is null:

"next_page": null,...

Your code will check the next_page property. If not null, it'll make another request using the specified URL. If null, it'll stop. To learn more, see Paginating through lists.

Insert the following line (highlighted) after the endpoint variable declaration:

f'{ZENDESK_SUBDOMAIN}/api/v2/help_center/{language.lower()}/articles.json'while endpoint:

Indent all the lines that follow the while statement.

while endpoint:response = requests.get(endpoint, auth=credentials)if response.status_code != 200:    print(f'Failed to retrieve articles with error {response.status_code}')    exit()data = response.json()
for article in data['articles']:    if article['body'] is None:        continue    title = '<h1>' + article['title'] + '</h1>'    filename = f'{article["id"]}.html'    with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:        f.write(title + '\n' + article['body'])    print(f'{article["id"]} copied!')
    log.append((filename, article['title'], article['author_id']))

Add the following statement as the last line and indent it too:
```
endpoint = data.get('next_page', None)
```
This sets up a loop to paginate through the results. While the endpoint variable is true -- in other words, while it contains a url -- a request is made. After getting and displaying a page of results, the script assigns the value of the next_page property to the endpoint variable. If the value is still a url, the loop runs again. If the value is null, such as when the API returns the last page of results, the loop stops.

Run the script again from the command line:

python3 make_backup.py

You should get a list of all the articles in the language in your knowledge base.

The next step is to make copies of the articles on your computer.

Write the articles to files

In this section, you create HTML files of all the articles in your knowledge base.

The twist here is that the body attribute of an article only contains the HTML of the body, as its name suggests. The article's title isn't included. The title is specified by another attribute named title. You'll add the title to the article's HTML before writing the file.

Replace the following test line:

print(article['id'])

with the following lines:

if article['body'] is None:    continuetitle = '<h1>' + article['title'] + '</h1>'filename = '{id}.html'.format(id=article['id'])with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:    f.write(title + '\n' + article['body'])print('{id} copied!'.format(id=article['id']))

Make sure to indent them at the same level as the print statement. The lines perform the following tasks:

Skips any blank articles
Creates an H1 tag with the article title
Creates a file name based on the article ID to guarantee unique names
Creates a file in the folder the script created earlier using the backup_path variable
Combines the title, a line break, and the article body in one string
Writes the string to the file
Prints a message to the console so you can track the progress of the backup operation.

Your modified code should look as follows:

for article in data['articles']:    if article['body'] is None:        continue    title = '<h1>' + article['title'] + '</h1>'    filename = '{id}.html'.format(id=article['id'])    with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:        f.write(title + '\n' + article['body'])    print('{id} copied!'.format(id=article['id']))

If the article body is blank, the continue statement on the third line skips the rest of the steps in the for loop and moves to the next article in the list. The logic prevents the script from printing any empty drafts in your help center that might be acting as placeholders for future content. It also prevents the script from breaking when you try to concatenate a string type with a Python 'NoneType' in the snippet's next-to-last line (title + '\n' + article['body']).

Run the script again from the command line:

python3 make_backup.py

The script writes all the articles in your knowledge to your language folder. Open a few files in a text editor to check the HTML.

Create a backup log

In this section, you create a backup log for easier reference later. The log will consist of a csv file with File, Title, and Author ID columns and a row for each article that's backed up.

Import the native csv library at the top of the script:
```
import csv
```
Create the following log variable (highlighted) just before the first endpoint variable declaration:
```
log = []endpoint = zendesk + '/api/v2/help_center/ ......
```
The variable declares an empty list. After writing each article to file, the script will update the list with information about the article.
Add the following log.append() statement (highlighted) immediately following and at the same indent level as the print statement:
```
print('{id} copied!'.format(id=article['id']))log.append((filename, article['title'], article['author_id']))
```
After writing an article, the script appends a data item to the log list. The double parentheses are intended. You're appending a Python tuple, a kind of list that uses parentheses. The csv library uses tuples to add rows to a spreadsheet. Each row consists of a filename, title, and author id.

Add the following lines at the bottom of the script. The first line should be flush to the margin (no indent and no wrap):

with open(os.path.join(backup_path, '_log.csv'), mode='wt', encoding='utf-8') as f:    writer = csv.writer(f)    writer.writerow( ('File', 'Title', 'Author ID') )    for article in log:        writer.writerow(article)

After writing all the articles, the script creates a file called _log.csv. The underscore ensures the file appears first in any file browser. The script adds a header row and then a row for each article in the log list.

Code complete

Your completed script should look like as follows. You can also download a copy by clicking make_backup.py.

import osimport datetimeimport csv
import requests
ZENDESK_API_TOKEN = os.getenv('ZENDESK_API_TOKEN') ZENDESK_USER_EMAIL = '{YOUR_ZENDESK_EMAIL_ADDRESS}'ZENDESK_SUBDOMAIN = '{YOUR_ZENDESK_SUBDOMAIN}'language = 'en'
date = datetime.date.today()backup_path = os.path.join(str(date), language)if not os.path.exists(backup_path):    os.makedirs(backup_path)
log = []
credentials = f'{ZENDESK_USER_EMAIL}/token', ZENDESK_API_TOKEN
endpoint = f'{ZENDESK_SUBDOMAIN}/api/v2/help_center/{language.lower()}/articles.json'while endpoint:    response = requests.get(endpoint, auth=credentials)    if response.status_code != 200:        print(f'Failed to retrieve articles with error {response.status_code}')        exit()    data = response.json()
    for article in data['articles']:        if article['body'] is None:            continue        title = '<h1>' + article['title'] + '</h1>'        filename = f'{article["id"]}.html'        with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:            f.write(title + '\n' + article['body'])        print(f'{article["id"]} copied!')
        log.append((filename, article['title'], article['author_id']))
    endpoint = data.get('next_page', None)
with open(os.path.join(backup_path, '_log.csv'), mode='wt', encoding='utf-8') as f:    writer = csv.writer(f)    writer.writerow(('File', 'Title', 'Author ID'))    for article in log:        writer.writerow(article)```


Use the command line to navigate to your **backups** folder and run the script:
```shpython3 make_backup.py

The script makes a backup of your knowledge base in a language folder. It also creates a log file that you can use in a spreadsheet application.

Restoring articles

You can restore any backed up article with a second script that reads the content of each file, parses it into an HTML tree to extract the title and body for the help center, and uses the API to update the article in your help center.

The script in this section updates existing articles; it doesn't create new ones. To create, it would need to be modified to use a different endpoint, as well as to specify a section and author for the article.

Disclaimer: Though customizing the script to your own use is encouraged, Zendesk strongly recommends not modifying the script to restore your entire help center from a backup. The script overwrites any existing content, including any updates made to articles since the last backup. The changes can't be reverted. The script is meant to be used to restore selected articles on a case-by-case basis.

You'll need version 2.4.2 or greater of the requests library. To check your version, run pip show requests at the command line. To upgrade, run pip install requests --upgrade.

If you don't already have Beautiful Soup, you'll need to install it. Beautiful Soup is a Python library for parsing, navigating, searching, and modifying HTML trees. To install Beautiful Soup:

At the command line, enter:

pip install beautifulsoup4 requests

The command downloads and installs the latest version of Beautiful Soup.

Install lxml, an HTML parser that works with Beautiful Soup:
```
pip install lxml
```
Beautiful Soup works with a number of parsers. The lxml parser is one of the fastest.

To restore selected articles:

Copy the following script in a new text file, name it restore_articles.py, and save it in your backups folder with your make_backup.py file.

import os
import requestsfrom bs4 import BeautifulSoup
ZENDESK_API_TOKEN = os.getenv('ZENDESK_API_TOKEN')ZENDESK_USER_EMAIL = '{YOUR_ZENDESK_EMAIL_ADDRESS}'ZENDESK_SUBDOMAIN = 'https://{YOUR_ZENDESK_SUBDOMAIN}.zendesk.com'BACKUP_FOLDER = '20xx-xx-xx'language = 'en-us'RESTORE_LIST = [100000001, 100000002]
backup_path = os.path.join(BACKUP_FOLDER, language)if not os.path.exists(backup_path):    print('The specified backup path does not exist. Check the folder name and locale.')    exit()
for article_id in RESTORE_LIST:    file_path = os.path.join(backup_path, f'{article_id}.html')    with open(file_path, mode='r', encoding='utf-8') as file:        html_source = file.read()    tree = BeautifulSoup(html_source, 'lxml')    title = tree.h1.string.strip()    tree.h1.decompose()    payload = {'translation': {'title': title, 'body': str(tree.body)}}    endpoint = f'/api/v2/help_center/articles/{article_id}/translations/{language.lower()}.json'    url = ZENDESK_SUBDOMAIN + endpoint    auth = f'{ZENDESK_USER_EMAIL}/token', ZENDESK_API_TOKEN
    response = requests.put(url, json=payload, auth=auth)    if response.status_code == 200:        print(f'Article {article_id} restored')    else:        print('Failed to update article {} with error {}, {}'.format(article_id, response.status_code, response.text))

Replace the placeholder values in the Settings section with your own:
- ZENDESK_API_TOKEN - Your Zendesk API token. This is stored as an environment variable for security reasons.
- YOUR_ZENDESK_EMAIL_ADDRESS - Your Zendesk Support sign-in email address
- YOUR_ZENDESK_SUBDOMAIN - Your Zendesk subdomain name
- backup_folder - A folder name created by the backup script. Example: backup_folder = '2017-01-04'
- language - A locale corresponding to a subfolder in your backup folder. Example: language = 'en-us'
- restore_list - An array of article ids. Example: restore_list = [200459576, 201995096].

Use the command line to navigate to your backups folder and run the script:

python3 restore_articles.py