tf_1.8_xla_doc
Public Member Functions | Static Public Member Functions | List of all members
xla::CopyInsertion Class Reference

#include <copy_insertion.h>

Inherits HloPassInterface.

Public Member Functions

StatusOr< bool > Run (HloModule *module) override
 Run the pass on the given module. More...
 

Static Public Member Functions

static StatusOr< bool > AddCopiesForBufferAssignment (HloModule *module)
 

Detailed Description

Google docs:

Copy insertion is a legalization HLO pass which inserts copies (kCopy instructions) to eliminate several kinds of problems in the HLO module.

(1) Entry parameter or a constant live out of the entry computation. Entry computation arguments and constants have different lifetimes than the computation result and cannot share the same allocation. Parameters and constants live out of non-entry computations do not need copies.

(2) Different values which are simultaneously live and which must be held in the same buffer. This can occur in while bodies. Specifically, the while loop state (the arguments to the while instruction) is updated in-place and the update may clobber the value from the previous iteration before the previous value is dead. Computations called from kCall instructions do not need such copies because kCall has no update in-place semantics.

(3) The buffer set of the root instruction of the entry computation must be unambiguous and distinct. That is, InstructionAliasSet::IsAmbiguous and InstructionAliasSet::IsDistinct return true.

Member Function Documentation

◆ AddCopiesForBufferAssignment()

StatusOr< bool > xla::CopyInsertion::AddCopiesForBufferAssignment ( HloModule module)
static

Google docs:

The CPU and GPU backend need additional copies added due to deficiencies in buffer assignment. Specifically, copies are needed for constants live-out of computations, and for values which are live-in and live-out of the same computation. These copies are needed because buffer-assignment uses a computation-scoped analyis (TuplePointsToAnalysis) and has limited visibility across computation boundaries. This method adds these necessary copies. Returns whether the module was modified.

TODO(b/62548313): Remove this when buffer assignment is module-scoped.

Here is the caller graph for this function:

◆ Run()

StatusOr< bool > xla::CopyInsertion::Run ( HloModule module)
override

Run the pass on the given module.

Returns whether the module was changed (copies were inserted).


Google docs:

Copy insertion is performed in three steps:

(1) Add copies conservatively to guarantee that there is no live-range interference. This is done simplistically and usually results in more copies than is strictly necessary.

(2) Using a more fine-grained analysis, remove as many copies that were added in (1) as possible while ensuring no live-range interference.

(3) Add copies to resolve issues not related to live range interference such as parameters and constants live out of the entry computation.

We add copies then remove them (step (1) then (2)) rather than simply adding only the copies that are necessary because, in general, it is difficult to figure out the minimal set of copies to add once there is interference. On the other hand, it is easy to determine if removing a copy will introduce interference.

The final copy insertion in (3) is done separately to simplify the implementation of copy removal in (2) which is the most complicated part of the pass. As is, copy removal only has to reason about live range interference. If all copies were added in step (1) then copy removal would also have to reason about things like constants and parameters live out of the computation.

  1. Save the original kCopy because they have special usage and will not be removed later.
  2. Call xla::anonymous_namespace{copy_insertion.cc}::AddCopiesToResolveInterference()
  3. Use xla::TupleSimplifier for structures done by deep copies
  4. Run DCE to remove the dead Tuple/GTE generated by tuple simplification
  5. Run `xla::anonymous_namespace{copy_insertion.cc}::RemoveUnnecessaryCopies
Here is the caller graph for this function:

The documentation for this class was generated from the following files: