#include <cpu_compiler.h>

Inheritance diagram for xla::cpu::CpuCompiler:

Collaboration diagram for xla::cpu::CpuCompiler:

Public Member Functions
StatusOr< std::vector< std::unique_ptr< AotCompilationResult > > >	CompileAheadOfTime (std::vector< std::unique_ptr< HloModule >> modules, const AotCompilationOptions &options) override

Private Member Functions
Status	RunHloPasses (HloModule *module, bool is_aot_compile)

Detailed Description

Google Doc:

CPU-targeting implementation of the XLA Compiler interface.

The compiler translates XLA HLO code into LLVM IR and uses LLVM's JIT infrastructure to create an executable "blob" that can then be returned wrapped in CpuExecutable and actually invoked.

Member Function Documentation

◆ CompileAheadOfTime()

StatusOr< std::vector< std::unique_ptr< AotCompilationResult > > > xla::cpu::CpuCompiler::CompileAheadOfTime	(	std::vector< std::unique_ptr< HloModule >>	modules,
		const AotCompilationOptions &	aot_options
	)

overridevirtual

Called by xla::CompileOnlyService::CompileAheadOfTime, not in caller graph due to overridden.

Initialize LLVM command line options
Check if platform is compatible for AOT
Convert options to llvm::Target (line 741 ~ 754)
Set LLVM output options (line 756 ~ 785)
1. Set relocation model: Static or PIC
2. Set PIC level: No PIC, small PIC, big PIC
3. Set PIE level: Default, small, large
Set target CPU, supported instructions and set optimization level.
Create llvm::Module using llvm::LLVMContext because it requires to be thread safe.
Loop through HLO modules
1. Use high-levle optimization (call xla::cpu::CpuCompiler::RunHloPasses(HloModule* module, bool is_aot_compile))
2. Create a xla::SequentialHloOrdering::HloModuleSequence using xla::CreateMemoryMinimizingSequence
3. Run buffer analysis on the HLO graph. Figures out which temporary buffers are required to run the computation
4. Construct IrEmmiter(that will but not yet compile HLO module to LLVM IR and saves to llvm_module(via xla::cpu::IrEmitter::IrEmitter())
5. For all the embedded_computation(xla::HloComputation::MakeEmbeddedComputationsList()) made from entry computation of HLO module except entry computation do
  1. Ignore fusion computation
  2. Emmit LLVM IR from computation via xla::cpu::IrEmitter::EmitComputation()
6. Assign and compile entry computation to entry_function of type llvm:Function type
7. Set up hook for pre-optimization phrase and post-optimzation phrase.
8. Run the xla::cpu::anonymous_namespace{cpu_compiler.cc}::VerifyLlvmModule() and fall back if verfication failed.
9. Create xla::cpu::Disassembler for xla::cpu::CompilerFunctor
10. Create xla::cpu::CompilerFunctor to compile llvm_module to object file
11. Save object file into a character vector
12. Create buffer for result, save to pointer.

Implements xla::Compiler.

◆ RunHloPasses()

Status xla::cpu::CpuCompiler::RunHloPasses	(	HloModule *	module,
		bool	is_aot_compile
	)

private

Called by xla::cpu::CpuCompiler::CompileAheadOfTime

Todo:: See what those invariant checker and pass do

Create a xla::HloPassPipeline
Add invariant checker xla::HloVerifier
Add pass xla::CpuHloSupportChecker
Set xla::ReducePrecisionInsertion::PassTiming to BEFORE_OPTIMIZATION and add pass
Add pass xla::CallInliner
Add pass xla::DotDecomposer
Add pass xla::cpu::ConvCanonicalization
Add pass xla::HloPassFix<xla::HloPassPipeline>("simplification")
1. Add invariant checker xla::HloVerifier
2. Add pass xla::BatchNormExpander
  - rewritie_training_op = true
  - rewrite_inference_op = true
  - rewrite_grad_op = true
  - use_fusion = false
3. Add pass xla::GatherExpander
4. Add pass xla::AlgebraicSimplifier
  - is_layout_sensitive = false
  - Todo:
    unknown argument
  - enable_dot_strength_reduction = false
5. Add pass xla::ZeroSizedHloElimination
6. Add pass xla::WhileLoopInvariantCodeMotion
7. Add pass xla::TupleSimplifier
8. Add pass xla::WhileLoopSimplifier
9. Add pass xla::HloDCE
10. Add pass xla::ReshapeMover
11. Add pass xla::HloConstantFolding
12. Add pass xla::ConditionalSimplifier
Add pass xla::TransposeFolding
Add pass xla::HloCSE with is_layout_sensitive is true
Add pass xla::cpu::CpuInstructionFusion
Set xla::ReducePrecisionInsertion::PassTiming to AFTER_FUSION and add pass
Add pass xla::cpu::CpuLayoutAssignment. Because the xla::cpu::CpuLayoutAssignment may leave behind kCopy instructions which are duplicate or NOPs, so remove them with xla::AlgebraicSimplifier and xla::HloCSE.
Add pass xla::HloPassFix<AlgebraicSimplifier>
Add pass xla::HloCSE with is_layout_sensitive is false
Add pass xla::HloElementTypeConverter to convert type BF16 to F32
Set max_parallelism to outline ops in the entry computation into subcomputations
If parallel backend is requested then add pass xla::cpu::ParallelizationPreparation
If is_aot_compile is false then add pass xla::cpu::ParallelTaskAssigner but is_aot_compile is always true in this case
Add pass xla::HloDCE
Add pass xla::FlattenCallGraph
Add pass xla::CpuCopyInsertion
Re-run outlining if parallel backend is requested, in case any copies were inserted into entry computation
1. Add pass xla::cpu::ParallelizationPreparation
2. Add pass xla::CpuCopyInsertion
Add pass xla::HloDCE
Start the process by calling xla::HloPassPipeline::Run

Here is the call graph for this function:

The documentation for this class was generated from the following files:

tensorflow/tensorflow/compiler/xla/service/cpu/cpu_compiler.h
tensorflow/tensorflow/compiler/xla/service/cpu/cpu_compiler.cc

Public Member Functions

Private Member Functions

Detailed Description

Member Function Documentation

◆ CompileAheadOfTime()

◆ RunHloPasses()