AMD Opteron =========== (Groupings taken from CrayPat and include native as well as PAPI events.) /* Group 0: Summary with instructions metrics*/ PAPI_TOT_INS, PAPI_FP_OPS, PAPI_L1_DCA, PAPI_L1_DCM, /* Group 1: Summary with TLB metrics*/ PAPI_FP_OPS, PAPI_L1_DCA, PAPI_L1_DCM, PAPI_TLB_DM, /* Group 2: L1 and L2 Metrics*/ PAPI_L1_DCA, DATA_CACHE_REFILLS:L2_MODIFIED:L2_OWNED:L2_EXCLUSIVE:L2_SHARED, DATA_CACHE_REFILLS_FROM_SYSTEM:ALL, REQUESTS_TO_L2:DATA, /* Group 3: Bandwidth information*/ DATA_CACHE_REFILLS:L2_MODIFIED:L2_OWNED:L2_EXCLUSIVE:L2_SHARED, DATA_CACHE_REFILLS_FROM_SYSTEM:ALL, DATA_CACHE_LINES_EVICTED:ALL, QUADWORDS_WRITTEN_TO_SYSTEM:ALL, /* Group 4: */ /* Group 5: Floating point mix*/ PAPI_FAD_INS, PAPI_FML_INS, PAPI_FDV_INS, RETIRED_MMX_AND_FP_INSTRUCTIONS:PACKED_SSE_AND_SSE2, /* Group 6: Cycles stalled, resources idle*/ PAPI_RES_STL, PAPI_FPU_IDL, PAPI_STL_ICY, INSTRUCTION_FETCH_STALL, /* Group 7: Cycles stalled, resources full*/ DISPATCH_STALLS, DISPATCH_STALL_FOR_FPU_FULL, DISPATCH_STALL_FOR_LS_FULL, DECODER_EMPTY, /* Group 8: Instructions and branches*/ PAPI_TOT_INS, INSTRUCTION_CACHE_MISSES, PAPI_BR_TKN, PAPI_BR_MSP, /* Group 9: Instruction cache*/ PAPI_L1_ICA, INSTRUCTION_CACHE_MISSES, PAPI_L2_ICM, INSTRUCTION_CACHE_REFILLS_FROM_L2, /* Group 10: Cache Hierarchy*/ PAPI_L1_DCA, PAPI_L2_DCA, PAPI_L2_DCM, PAPI_L3_TCM, /* Group 11: Floating point operations mix (2)*/ RETIRED_SSE_OPERATIONS:SINGLE_ADD_SUB_OPS:SINGLE_MUL_OPS:OP_TYPE, RETIRED_SSE_OPERATIONS:DOUBLE_ADD_SUB_OPS:DOUBLE_MUL_OPS:OP_TYPE, RETIRED_SSE_OPERATIONS:SINGLE_DIV_OPS:OP_TYPE, RETIRED_SSE_OPERATIONS:DOUBLE_DIV_OPS:OP_TYPE, /* Group 12: Floating point operations mix (vectorization)*/ RETIRED_SSE_OPERATIONS:SINGLE_ADD_SUB_OPS:SINGLE_MUL_OPS, RETIRED_SSE_OPERATIONS:DOUBLE_ADD_SUB_OPS:DOUBLE_MUL_OPS, RETIRED_SSE_OPERATIONS:SINGLE_ADD_SUB_OPS:SINGLE_MUL_OPS:OP_TYPE, RETIRED_SSE_OPERATIONS:DOUBLE_ADD_SUB_OPS:DOUBLE_MUL_OPS:OP_TYPE, /* Group 13: Floating point operations mix (SP)*/ RETIRED_SSE_OPERATIONS:SINGLE_ADD_SUB_OPS, RETIRED_SSE_OPERATIONS:SINGLE_MUL_OPS, RETIRED_SSE_OPERATIONS:SINGLE_ADD_SUB_OPS:OP_TYPE, RETIRED_SSE_OPERATIONS:SINGLE_MUL_OPS:OP_TYPE, /* Group 14: Floating point operations mix (DP)*/ RETIRED_SSE_OPERATIONS:DOUBLE_ADD_SUB_OPS, RETIRED_SSE_OPERATIONS:DOUBLE_MUL_OPS, RETIRED_SSE_OPERATIONS:DOUBLE_ADD_SUB_OPS:OP_TYPE, RETIRED_SSE_OPERATIONS:DOUBLE_MUL_OPS:OP_TYPE, /* Group 15: L3 (socket level)*/ READ_REQUEST_TO_L3_CACHE:ALL, L3_CACHE_MISSES:ALL, L3_FILLS_CAUSED_BY_L2_EVICTIONS:ALL, L3_EVICTIONS:ALL, /* Group 16: L3 (core level reads)*/ READ_REQUEST_TO_L3_CACHE:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_0_SELECT, READ_REQUEST_TO_L3_CACHE:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_1_SELECT, READ_REQUEST_TO_L3_CACHE:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_2_SELECT, READ_REQUEST_TO_L3_CACHE:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_3_SELECT, /* Group 17: L3 (core level misses)*/ L3_CACHE_MISSES:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_0_SELECT, L3_CACHE_MISSES:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_1_SELECT, L3_CACHE_MISSES:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_2_SELECT, L3_CACHE_MISSES:READ_BLOCK_MODIFY:READ_BLOCK_EXCLUSIVE:READ_BLOCK_SHARED:CORE_3_SELECT, /* Group 18: L3 (core level fills caused by L2 evictions)*/ L3_FILLS_CAUSED_BY_L2_EVICTIONS:MODIFIED:OWNED:EXCLUSIVE:SHARED:CORE_0_SELECT, L3_FILLS_CAUSED_BY_L2_EVICTIONS:MODIFIED:OWNED:EXCLUSIVE:SHARED:CORE_1_SELECT, L3_FILLS_CAUSED_BY_L2_EVICTIONS:MODIFIED:OWNED:EXCLUSIVE:SHARED:CORE_2_SELECT, L3_FILLS_CAUSED_BY_L2_EVICTIONS:MODIFIED:OWNED:EXCLUSIVE:SHARED:CORE_3_SELECT, /* Group 19: Prefetches */ DATA_PREFETCHES:CANCELLED, DATA_PREFETCHES:ATTEMPTED, PREFETCH_INSTRUCTIONS_DISPATCHED:LOAD, PREFETCH_INSTRUCTIONS_DISPATCHED:STORE, Intel Woodcrest/HarperTown (2 counters per core) ================================================ 0 PAPI_FP_OPS, PAPI_TOT_CYC 1 PAPI_TOT_IIS, PAPI_TOT_INS 2 PAPI_L1_TCM, PAPI_L2_TCM 3 PAPI_FML_INS, PAPI_FDV_INS Intel Core2Duo (4 counters per core) ==================================== 0 PAPI_FP_OPS, PAPI_TOT_CYC, PAPI_VEC_INS, PAPI_TOT_INS Power6. ======= All IBM defined groups hpm* are included by default. 0 pm_hpm1 FPU executed one flop instruction FPU executed multiply-add instruction FPU executed FSQRT or FDIV instruction Processor cycles Run instructions completed Run cycles 1 pm_hpm2 Instructions completed LSU executed Floating Point load instruction FPU executed store instruction Processor cycles Run instructions completed Run cycles 2 pm_hpm3 Processor cycles L1 D cache load misses L1 D cache store misses Instructions completed Run instructions completed Run cycles 3 pm_hpm4 Instructions completed Instructions dispatched L1 D cache load references L1 D cache store references Run instructions completed Run cycles 4 pm_hpm5 FPU produced a result Processor cycles FXU0 produced a result FXU1 produced a result Run instructions completed Run cycles Note: The values collected from each execution unit may change between runs, but the sum of the values for each execution unit should be similar between runs. 5 pm_hpm6 Data loaded from L2 Data loaded from private L2 other core Data loaded from L2.5 modified Data loaded from L2.5 shared Run instructions completed Run cycles 6 pm_hpm7 Data loaded from L3.5 modified Data loaded from L3.5 shared Data loaded from L3 Processor cycles Run instructions completed Run cycles 7 pm_hpm8 FPU executed one flop instruction FPU executed multiply-add instruction FPU executed store instruction L1 D cache load misses Run instructions completed Run cycles 8 pm_hpm9 L1 D cache load misses Processor cycles LSU executed Floating Point load instruction L1 D cache store misses Run instructions completed Run cycles 9 pm_hpm10 Instructions completed L2 cache misses Instruction fetched missed L3 Data loaded from private L3 miss Run instructions completed Run cycles