Large data volumes

SOQL Best Practices for Large Data Volumes

Published 11 June 2026 · 13 min read · Advanced

Large data volumes — typically defined as objects with more than one million records — require a different approach to SOQL query design. The patterns that work in a development org with a few hundred test records will fail in production with millions. This guide covers the specific techniques that keep queries fast and transactions stable at scale.

What happens to SOQL at scale

The Salesforce database is a multi-tenant shared environment. Long-running queries consume database resources that affect performance for other orgs on the same instance. For this reason, Salesforce enforces selectivity requirements that become stricter as object record counts grow.

A non-selective query on an object with 10,000 records might succeed with a warning. The same query on an object with 5 million records will likely timeout, return a QUERY_TIMEOUT error, or throw a SOQL_SELECTIVITY_ERROR.

Understanding indexes

The database uses indexes to find records without scanning every row. Salesforce automatically indexes record IDs, Name (on most objects), OwnerId, CreatedDate, SystemModstamp, RecordTypeId and all lookup/master- detail foreign key fields.

For a query to be selective, the WHERE clause must include a filter on an indexed field, and that filter must be sufficiently specific. The threshold varies by object size — Salesforce uses a rough guideline that the filtered result should represent less than 10% of the total records on smaller objects, and as little as 5% on very large ones.

Query patterns for large data volumes

Always filter on indexed fields


// Non-selective on large objects — full table scan
List<Opportunity> opps = [
SELECT Id, Name
FROM Opportunity
WHERE CustomTextField__c = 'Target Value'
];// Selective — uses index on StageName (if custom indexed) + OwnerId
List<Opportunity> opps = [
SELECT Id, Name
FROM Opportunity
WHERE StageName = 'Closed Won'
AND OwnerId = :userId
AND CloseDate >= :startDate
];

### Use LIMIT for UI and reporting queries

Any query that powers a UI component should include a LIMIT clause.
Users cannot meaningfully interact with thousands of records in a single view.

### Avoid OFFSET for pagination at scale

OFFSET works by retrieving all rows up to the offset position and discarding
them — at offset 100,000 on a million-record object, the database scans
100,000 rows to return the next page. Use Apex Cursors or server-side
keyset pagination instead.

## Batch Apex and Apex Cursors for bulk processing

When you need to process all records on a large object, Batch Apex and
Apex Cursors are the correct approaches.

Batch Apex uses a `QueryLocator` in the start method, which supports up to
50 million records and processes them in configurable batches of up to 2,000.
The database cursor is maintained server-side.

Apex Cursors, generally available since Spring '26, provide a similar
capability using Queueable Apex. Each Queueable job fetches the next slice
from the cursor and chains to the next job. This is preferable to Batch Apex
for processing pipelines where you need more control over execution flow.

## Data skew prevention

Ownership skew — one user owning millions of records — is the most common
large data volume problem in integration scenarios. When an integration user
inserts all records, every record has the same OwnerId. This creates lock
contention when multiple transactions update records owned by the same user
simultaneously, because Salesforce locks the user's ownership table row.

The solution is to distribute record ownership: assign records to queues,
or distribute ownership across multiple integration user profiles, or
implement a custom ownership assignment pattern that spreads records
across a pool of users.

Test your knowledge — Large data volumes

10 questions · Basic to Advanced

0 / 10 correct

Frequently asked questions

What is a selective SOQL query?

A selective query filters on indexed fields so the database can use an index to narrow results without scanning the entire table. On objects with millions of records, non-selective queries trigger full table scans that timeout or throw SOQL_SELECTIVITY_ERROR.

Which fields are automatically indexed in Salesforce?

Salesforce automatically indexes the record ID, Name (on most objects), OwnerId, CreatedDate, SystemModstamp, RecordTypeId, and all lookup and master-detail foreign key fields. All other fields require a custom index requested from Salesforce Support.

What is a Skinny Table in Salesforce and when should I request one?

A Skinny Table is a custom database table maintained by Salesforce alongside the standard object table, containing only a subset of frequently queried fields to avoid joins to the wide base table. Request one from Salesforce Support for large objects with high-frequency queries that cannot be made selective through standard indexing.

How do I process more than 50,000 records in Apex?

Use Batch Apex with a QueryLocator start method (supports up to 50 million records) or Apex Cursors with Queueable Apex (GA since Spring '26). Both maintain a server-side cursor across transactions, avoiding the 50,000-row-per-transaction limit. The OFFSET clause is not suitable at scale — it has a ceiling of 2,000 records and degrades severely as offset grows.

What is data skew in Salesforce and why is it a problem?

Data skew occurs when a disproportionate number of records share the same owner or parent — for example, an integration user owning millions of records. It causes database lock contention during concurrent updates and triggers expensive sharing recalculations. The fix is to distribute ownership across queues or multiple users.

How do I make a SOQL query on a non-indexed field more selective?

Combine the non-indexed field filter with a filter on an indexed field such as OwnerId or CreatedDate. This gives the query optimiser an indexed field to narrow the result set first, after which the non-indexed filter is applied to a much smaller dataset.