Large data volumes — typically defined as objects with more than one million records — require a different approach to SOQL query design. The patterns that work in a development org with a few hundred test records will fail in production with millions. This guide covers the specific techniques that keep queries fast and transactions stable at scale.
What happens to SOQL at scale
The Salesforce database is a multi-tenant shared environment. Long-running queries consume database resources that affect performance for other orgs on the same instance. For this reason, Salesforce enforces selectivity requirements that become stricter as object record counts grow.
A non-selective query on an object with 10,000 records might succeed with a warning. The same query on an object with 5 million records will likely timeout, return a QUERY_TIMEOUT error, or throw a SOQL_SELECTIVITY_ERROR.
Understanding indexes
The database uses indexes to find records without scanning every row. Salesforce automatically indexes record IDs, Name (on most objects), OwnerId, CreatedDate, SystemModstamp, RecordTypeId and all lookup/master- detail foreign key fields.
For a query to be selective, the WHERE clause must include a filter on an indexed field, and that filter must be sufficiently specific. The threshold varies by object size — Salesforce uses a rough guideline that the filtered result should represent less than 10% of the total records on smaller objects, and as little as 5% on very large ones.
Query patterns for large data volumes
Always filter on indexed fields
// Non-selective on large objects — full table scan
List<Opportunity> opps = [
SELECT Id, Name
FROM Opportunity
WHERE CustomTextField__c = 'Target Value'
];// Selective — uses index on StageName (if custom indexed) + OwnerId
List<Opportunity> opps = [
SELECT Id, Name
FROM Opportunity
WHERE StageName = 'Closed Won'
AND OwnerId = :userId
AND CloseDate >= :startDate
];
### Use LIMIT for UI and reporting queries
Any query that powers a UI component should include a LIMIT clause.
Users cannot meaningfully interact with thousands of records in a single view.
### Avoid OFFSET for pagination at scale
OFFSET works by retrieving all rows up to the offset position and discarding
them — at offset 100,000 on a million-record object, the database scans
100,000 rows to return the next page. Use Apex Cursors or server-side
keyset pagination instead.
## Batch Apex and Apex Cursors for bulk processing
When you need to process all records on a large object, Batch Apex and
Apex Cursors are the correct approaches.
Batch Apex uses a `QueryLocator` in the start method, which supports up to
50 million records and processes them in configurable batches of up to 2,000.
The database cursor is maintained server-side.
Apex Cursors, generally available since Spring '26, provide a similar
capability using Queueable Apex. Each Queueable job fetches the next slice
from the cursor and chains to the next job. This is preferable to Batch Apex
for processing pipelines where you need more control over execution flow.
## Data skew prevention
Ownership skew — one user owning millions of records — is the most common
large data volume problem in integration scenarios. When an integration user
inserts all records, every record has the same OwnerId. This creates lock
contention when multiple transactions update records owned by the same user
simultaneously, because Salesforce locks the user's ownership table row.
The solution is to distribute record ownership: assign records to queues,
or distribute ownership across multiple integration user profiles, or
implement a custom ownership assignment pattern that spreads records
across a pool of users. Test your knowledge — Large data volumes
10 questions · Basic to Advanced