Creates attention layer
Dot-product attention layer, a.k.a. Luong-style attention.
layer_attention( inputs, use_scale = FALSE, causal = FALSE, batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, weights = NULL )
inputs |
a list of inputs first should be the query tensor, the second the value tensor |
use_scale |
If True, will create a scalar variable to scale the attention scores. |
causal |
Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. |
batch_size |
Fixed batch size for layer |
dtype |
The data type expected by the input, as a string ( |
name |
An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn't provided. |
trainable |
Whether the layer weights will be updated during training. |
weights |
Initial weights for layer. |
Other core layers:
layer_activation()
,
layer_activity_regularization()
,
layer_dense_features()
,
layer_dense()
,
layer_dropout()
,
layer_flatten()
,
layer_input()
,
layer_lambda()
,
layer_masking()
,
layer_permute()
,
layer_repeat_vector()
,
layer_reshape()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.