tf_1.8_xla_doc
|
#include <cpu_compiler.h>
Public Member Functions | |
StatusOr< std::vector< std::unique_ptr< AotCompilationResult > > > | CompileAheadOfTime (std::vector< std::unique_ptr< HloModule >> modules, const AotCompilationOptions &options) override |
Private Member Functions | |
Status | RunHloPasses (HloModule *module, bool is_aot_compile) |
Google Doc:
CPU-targeting implementation of the XLA Compiler interface.
The compiler translates XLA HLO code into LLVM IR and uses LLVM's JIT infrastructure to create an executable "blob" that can then be returned wrapped in CpuExecutable and actually invoked.
|
overridevirtual |
Called by xla::CompileOnlyService::CompileAheadOfTime
, not in caller graph due to overridden.
llvm::Target
(line 741 ~ 754)llvm::Module
using llvm::LLVMContext
because it requires to be thread safe.xla::cpu::CpuCompiler::RunHloPasses(HloModule* module, bool is_aot_compile)
)xla::SequentialHloOrdering::HloModuleSequence
using xla::CreateMemoryMinimizingSequence
llvm_module
(via xla::cpu::IrEmitter::IrEmitter()
)xla::HloComputation::MakeEmbeddedComputationsList()
) made from entry computation of HLO module except entry computation doxla::cpu::IrEmitter::EmitComputation()
entry_function
of type llvm:Function
typexla::cpu::anonymous_namespace{cpu_compiler.cc}::VerifyLlvmModule()
and fall back if verfication failed.xla::cpu::Disassembler
for xla::cpu::CompilerFunctor
xla::cpu::CompilerFunctor
to compile llvm_module to object fileImplements xla::Compiler.
|
private |
Called by xla::cpu::CpuCompiler::CompileAheadOfTime
xla::HloPassPipeline
xla::HloVerifier
xla::CpuHloSupportChecker
xla::ReducePrecisionInsertion::PassTiming
to BEFORE_OPTIMIZATION
and add passxla::CallInliner
xla::DotDecomposer
xla::cpu::ConvCanonicalization
xla::HloPassFix<xla::HloPassPipeline>("simplification")
xla::HloVerifier
xla::BatchNormExpander
rewritie_training_op
= truerewrite_inference_op
= truerewrite_grad_op
= trueuse_fusion
= falsexla::GatherExpander
xla::AlgebraicSimplifier
is_layout_sensitive
= falseenable_dot_strength_reduction
= falsexla::ZeroSizedHloElimination
xla::WhileLoopInvariantCodeMotion
xla::TupleSimplifier
xla::WhileLoopSimplifier
xla::HloDCE
xla::ReshapeMover
xla::HloConstantFolding
xla::ConditionalSimplifier
xla::TransposeFolding
xla::HloCSE
with is_layout_sensitive
is truexla::cpu::CpuInstructionFusion
xla::ReducePrecisionInsertion::PassTiming
to AFTER_FUSION
and add passxla::cpu::CpuLayoutAssignment
. Because the xla::cpu::CpuLayoutAssignment
may leave behind kCopy
instructions which are duplicate or NOPs, so remove them with xla::AlgebraicSimplifier
and xla::HloCSE
.xla::HloPassFix<AlgebraicSimplifier>
xla::HloCSE
with is_layout_sensitive
is falsexla::HloElementTypeConverter
to convert type BF16
to F32
max_parallelism
to outline ops in the entry computation into subcomputationsxla::cpu::ParallelizationPreparation
is_aot_compile
is false then add pass xla::cpu::ParallelTaskAssigner
but is_aot_compile
is always true in this casexla::HloDCE
xla::FlattenCallGraph
xla::CpuCopyInsertion
xla::cpu::ParallelizationPreparation
xla::CpuCopyInsertion
xla::HloDCE
xla::HloPassPipeline::Run