Understanding the limitations of offset pagination
Paginated data may be inaccurate when using offset pagination because of the real-time nature of the data. This article describes how these inaccuracies are introduced and ways to reduce them.
How inaccuracies are introduced
Inaccuracies are introduced when one or more items are added or removed from your database instance between next page requests and during the course of iterating over all the items.
In the stateless, offset pagination method used by the Zendesk REST API and others like it, each next page request causes the server to query the database and return the specified subset. The full record set is not retrieved and stored statically in memory for follow-up requests.
The server uses the total record count divided by the maximum number of records per page to determine the subset of records to return with each next page request. If the total record count changes between requests, the subset of records selected for the subsequent requests may change too. If records are added, some records may be selected again. If records are removed, some records may be skipped. To better understand this phenomenon, see Paginating Real-Time Data with Cursor Based Pagination on sitepoint.com.
Reducing inaccuracies
One way to reduce pagination inaccuracies is to choose cursor-based pagination instead of offset pagination. See Adding pagination with the cursor-based method. Cursor-based pagination is currently available only for certain list endpoints in the Support API, including List Tickets and List Users. If the API documentation for a specific list endpoint doesn't list pagination methods, then the resource only supports offset pagination.
If cursor-based pagination is not available, one way to reduce pagination inaccuracies -- though not eliminate them altogether -- is to sort the results from oldest to newest so that new records added during pagination affect only the last pages, if at all. All things being equal, the early pages always have the same 100 sorted records even if the total record count changes.
Note: Some API resources such as Users can't be ordered. See the API documentation for the specific resource.
To sort records, append the sort_by=created_at
parameter to the initial endpoint URL as well as to each next_page
URL:
url = 'https://example.zendesk.com/api/v2/tickets.json?sort_by=created_at'
...
url = 'https://example.zendesk.com/api/v2/tickets.json?page=2&sort_by=created_at'
Some endpoints have an additional sort_order
parameter that should be set to asc
. See Articles.
Depending on the endpoint, you can also sort by ids, which are assigned sequentially when items are created:
https://example.zendesk.com/api/v2/tickets.json?page=2&sort_by=id
For tickets, another option is to use the incremental ticket export endpoint. The results are time-based rather than page-based. To avoid duplicates in subsequent requests, the endpoint returns the update time of the latest ticket on the page as the start_time
value in a 'next_page' URL:
{
"results": [
...
],
...
"end_time":1405469030,
"next_page":"https://example.zendesk.com/api/v2/incremental/tickets.json?start_time=1405469030"
}
Another option is to use the reporting feature in Zendesk Support to export data to a CSV or XML file. For details, see Exporting data to a CSV or XML file.