Sunday, September 14, 2025

Autovacuum does NOT support parallel index vacuuming

 

Overview

Autovacuum does NOT support parallel index vacuuming, unlike manual VACUUM operations. This is a fundamental architectural limitation that affects performance characteristics of background maintenance operations.

Key Limitation

Autovacuum workers always process indexes sequentially, regardless of:

  • Number of indexes on the table
  • Index size or complexity
  • Available system resources
  • max_parallel_workers or related settings

Source Code Evidence

Autovacuum Parameter Setup

In src/backend/postmaster/autovacuum.c, the autovacuum_do_vac_analyze() function explicitly sets parallel workers to zero:

static void
autovacuum_do_vac_analyze(autovac_table *tab, BufferAccessStrategy bstrategy)
{
    VacuumParams params;
    
    /* Initialize vacuum parameters */
    memset(&params, 0, sizeof(params));
    
    /* Set various vacuum options */
    params.options = VACOPT_SKIPTOAST | 
                    (dovacuum ? VACOPT_VACUUM : 0) |
                    (doanalyze ? VACOPT_ANALYZE : 0);
    
    /* CRITICAL: Autovacuum never uses parallel workers */
    params.nworkers = 0;  /* No parallel workers for autovacuum */
    
    /* Set other parameters... */
    params.freeze_min_age = freeze_min_age;
    params.freeze_table_age = freeze_table_age;
    params.multixact_freeze_min_age = multixact_freeze_min_age;
    params.multixact_freeze_table_age = multixact_freeze_table_age;
    
    /* Call vacuum with sequential-only parameters */
    vacuum(NIL, &params, bstrategy, vac_context, true);
}

Manual VACUUM vs Autovacuum Comparison

Manual VACUUM (Supports Parallel)

/* In ExecVacuum() - src/backend/commands/vacuum.c */
void
ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
{
    VacuumParams params;
    
    /* Parse PARALLEL option from user command */
    if (vacstmt->options & VACOPT_PARALLEL)
    {
        /* User can specify: VACUUM (PARALLEL 4) table_name; */
        params.nworkers = vacstmt->parallel_workers;
    }
    else
    {
        params.nworkers = 0;  /* Default: no parallel */
    }
    
    /* Manual vacuum can use parallel workers */
    vacuum(vacstmt->rels, &params, bstrategy, vac_context, isTopLevel);
}

Autovacuum (Always Sequential)

/* In autovacuum_do_vac_analyze() - src/backend/postmaster/autovacuum.c */
static void
autovacuum_do_vac_analyze(autovac_table *tab, BufferAccessStrategy bstrategy)
{
    VacuumParams params;
    
    /* Autovacuum NEVER supports parallel workers */
    params.nworkers = 0;  /* Hardcoded to 0 - no user control */
    
    /* No way to override this in autovacuum */
    vacuum(NIL, &params, bstrategy, vac_context, true);
}

Index Vacuuming Process

Sequential Index Processing in Autovacuum

When autovacuum processes indexes, it uses the sequential path in lazy_vacuum_all_indexes():

/* In src/backend/access/heap/vacuumlazy.c */
static void
lazy_vacuum_all_indexes(LVRelState *vacrel)
{
    int nindexes = vacrel->nindexes;
    Relation *indrels = vacrel->indrels;
    
    /* Check if parallel vacuum is possible */
    if (vacrel->params->nworkers > 0 && nindexes > 1)
    {
        /* PARALLEL PATH - Only for manual VACUUM */
        lazy_parallel_vacuum_indexes(vacrel);
    }
    else
    {
        /* SEQUENTIAL PATH - Always used by autovacuum */
        for (int i = 0; i < nindexes; i++)
        {
            lazy_vacuum_one_index(indrels[i], vacrel->stats,
                                 vacrel->dead_items, vacrel->old_live_tuples);
        }
    }
}

Since vacrel->params->nworkers is always 0 for autovacuum, it always takes the sequential path.

Individual Index Vacuum Function

/* Sequential index vacuum - used by autovacuum */
static void
lazy_vacuum_one_index(Relation indrel, LVRelStats *stats,
                     TidStore *dead_items, double old_live_tuples)
{
    IndexBulkDeleteResult *stats_res;
    
    /* Single-threaded index cleanup */
    stats_res = index_bulk_delete(indrel, lazy_tid_reaped,
                                 (void *) dead_items,
                                 stats->num_dead_tuples,
                                 old_live_tuples);
    
    /* Update statistics */
    if (stats_res)
    {
        stats->pages_removed += stats_res->pages_removed;
        stats->tuples_removed += stats_res->tuples_removed;
        pfree(stats_res);
    }
}

Performance Implications

Tables with Many Indexes

For tables with multiple large indexes, autovacuum performance is significantly impacted:

-- Example: Table with 8 indexes
CREATE TABLE large_table (
    id BIGINT PRIMARY KEY,
    col1 INTEGER,
    col2 TEXT,
    col3 TIMESTAMP,
    col4 JSONB,
    col5 NUMERIC,
    col6 UUID,
    col7 INET
);

CREATE INDEX idx1 ON large_table (col1);
CREATE INDEX idx2 ON large_table (col2);
CREATE INDEX idx3 ON large_table (col3);
CREATE INDEX idx4 ON large_table USING GIN (col4);
CREATE INDEX idx5 ON large_table (col5);
CREATE INDEX idx6 ON large_table (col6);
CREATE INDEX idx7 ON large_table (col7);

Autovacuum behavior:

  • Processes all 8 indexes sequentially
  • Total time = sum of individual index vacuum times
  • Cannot utilize multiple CPU cores for index cleanup

Manual VACUUM behavior:

-- Can process indexes in parallel
VACUUM (PARALLEL 4) large_table;
  • Can process up to 4 indexes simultaneously
  • Total time ≈ max(individual index vacuum times)
  • Utilizes multiple CPU cores

Resource Utilization Differences

Autovacuum Resource Usage

/* Autovacuum characteristics */
- Single worker process per table
- Sequential index processing
- Lower CPU utilization
- Longer vacuum duration
- Designed for minimal impact on workload

Manual Parallel VACUUM Resource Usage

/* Manual parallel vacuum characteristics */
- Leader process + multiple worker processes
- Parallel index processing
- Higher CPU utilization
- Shorter vacuum duration
- Can impact concurrent workload more significantly

Why Autovacuum Doesn't Support Parallel Processing

1. Background Process Design Philosophy

/*
 * Autovacuum is designed to be minimally intrusive:
 * - Runs in background with low priority
 * - Uses cost-based delay to throttle I/O
 * - Avoids competing for resources with user queries
 * - Parallel workers would increase resource contention
 */

2. Complexity Management

/*
 * Parallel worker management adds complexity:
 * - Dynamic shared memory allocation
 * - Inter-process communication
 * - Error handling across multiple processes
 * - Resource cleanup on worker failure
 */

3. Cost-Based Delay Coordination

In src/backend/postmaster/autovacuum.c:

/*
 * Cost-based delay balancing across workers:
 * - Autovacuum balances vacuum_cost_delay across all active workers
 * - Parallel workers within a single vacuum would complicate this
 * - Current design: one worker per table, simple cost accounting
 */
static void
AutoVacuumUpdateDelay(void)
{
    /* Rebalance cost delay across all autovacuum workers */
    int nworkers_for_balance = pg_atomic_read_u32(&AutoVacuumShmem->av_nworkersForBalance);
    
    if (nworkers_for_balance > 0)
    {
        /* Distribute delay across workers */
        VacuumCostDelay = VacuumCostDelayLimit / nworkers_for_balance;
    }
}

4. Historical Design

/*
 * Timeline of features:
 * - Autovacuum: PostgreSQL 8.1 (2005)
 * - Parallel vacuum: PostgreSQL 11 (2018)
 * 
 * Autovacuum predates parallel vacuum by 13 years
 * Retrofitting parallel support would require significant changes
 */

Workarounds and Alternatives

Manual Parallel VACUUM for Large Tables

-- Identify tables that would benefit from parallel vacuum
SELECT 
    schemaname,
    tablename,
    n_dead_tup,
    last_autovacuum,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_stat_user_tables 
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Run manual parallel vacuum during maintenance windows
VACUUM (PARALLEL 4, VERBOSE) large_table;

No comments:

Post a Comment