Is the Keras Embedding layer dependent on the target label?How does Keras 'Embedding' layer work?Should word embedding vectors be normalized before being used as inputs?deep learning - word embedding with parts of speechRandomly initialized embedding matrixHow to use Keras pre-trained 'Embedding' layer?How the embedding layer is trained in Keras Embedding layerDimension reduction - word embeddings as inputs for a time series model (LSTM)What is difference between keras embedding layer and word2vec?Learning image embeddings using VGG and Word2VecCan an embedding layer be replaced by a fully connected layer?Autoencoder keeping constant vector as predict in keras

Is Dumbledore a human lie detector?

What is the reason for setting flaps 1 on the ground at high temperatures?

What do you call the action of "describing events as they happen" like sports anchors do?

Is the Keras Embedding layer dependent on the target label?

Ability To Change Root User Password (Vulnerability?)

What differences exist between adamantine and adamantite in all editions of D&D?

Can there be absolute velocity?

Why did Intel abandon unified CPU cache?

How and why do references in academic papers work?

Does the new finding on "reversing a quantum jump mid-flight" rule out any interpretations of QM?

How do we say "within a kilometer radius spherically"?

Canada travel to US using Global Entry

Analogy between an unknown in an argument, and a contradiction in the principle of explosion

Tikz-cd diagram arrow passing under a node - not crossing it

Why ambiguous grammars are bad?

How to destroy a galactic level civilization and still leave behind primitive survivors?

Should I put programming books I wrote a few years ago on my resume?

As easy as Three, Two, One... How fast can you go from Five to Four?

Is it a acceptable way to write a loss function in this form?

How can powerful telekinesis avoid violating Newton's 3rd Law?

Confused with atmospheric pressure equals plastic balloon’s inner pressure

Is there a DSLR/mirorless camera with minimal options like a classic, simple SLR?

Does a (nice) centerless group always have a centerless profinite completion?

Why did the World Bank set the global poverty line at $1.90?

Is the Keras Embedding layer dependent on the target label?

How does Keras 'Embedding' layer work?Should word embedding vectors be normalized before being used as inputs?deep learning - word embedding with parts of speechRandomly initialized embedding matrixHow to use Keras pre-trained 'Embedding' layer?How the embedding layer is trained in Keras Embedding layerDimension reduction - word embeddings as inputs for a time series model (LSTM)What is difference between keras embedding layer and word2vec?Learning image embeddings using VGG and Word2VecCan an embedding layer be replaced by a fully connected layer?Autoencoder keeping constant vector as predict in keras

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.

What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.

If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?

I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:

#pair 1 
dataset_y_row1 = [1]
dataset_y_row2 = [0]
dataset_X_row1 = [3,5,8,45,2]
dataset_X_row2 = [3,5,8,45,2]

#pair 2
dataset_y_row3 = [1]
dataset_y_row4 = [1]
dataset_X_row3 = [3,5,8,45,2]
dataset_X_row4 = [3,5,45,8,2]

My questions are the following:

Will the embedding layer see any difference between rows 1 and 2 (i.e. is
it 'target-label-sensitive')?

Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?

edited 5 hours ago

Mihai Chelaru

1837

asked 9 hours ago

Jan Musil

354

add a comment |

If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?

I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:

#pair 1 
dataset_y_row1 = [1]
dataset_y_row2 = [0]
dataset_X_row1 = [3,5,8,45,2]
dataset_X_row2 = [3,5,8,45,2]

#pair 2
dataset_y_row3 = [1]
dataset_y_row4 = [1]
dataset_X_row3 = [3,5,8,45,2]
dataset_X_row4 = [3,5,45,8,2]

My questions are the following:

Will the embedding layer see any difference between rows 1 and 2 (i.e. is
it 'target-label-sensitive')?

Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?

edited 5 hours ago

Mihai Chelaru

1837

asked 9 hours ago

Jan Musil

354

add a comment |

If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?

I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:

#pair 1 
dataset_y_row1 = [1]
dataset_y_row2 = [0]
dataset_X_row1 = [3,5,8,45,2]
dataset_X_row2 = [3,5,8,45,2]

#pair 2
dataset_y_row3 = [1]
dataset_y_row4 = [1]
dataset_X_row3 = [3,5,8,45,2]
dataset_X_row4 = [3,5,45,8,2]

My questions are the following:

Will the embedding layer see any difference between rows 1 and 2 (i.e. is
it 'target-label-sensitive')?

Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?

edited 5 hours ago

Mihai Chelaru

1837

asked 9 hours ago

Jan Musil

354

If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?

I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:

#pair 1 
dataset_y_row1 = [1]
dataset_y_row2 = [0]
dataset_X_row1 = [3,5,8,45,2]
dataset_X_row2 = [3,5,8,45,2]

#pair 2
dataset_y_row3 = [1]
dataset_y_row4 = [1]
dataset_X_row3 = [3,5,8,45,2]
dataset_X_row4 = [3,5,45,8,2]

My questions are the following:

Will the embedding layer see any difference between rows 1 and 2 (i.e. is
it 'target-label-sensitive')?

Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?

neural-networks keras word-embeddings embeddings

edited 5 hours ago

Mihai Chelaru

1837

asked 9 hours ago

Jan Musil

354

edited 5 hours ago

Mihai Chelaru

1837

asked 9 hours ago

Jan Musil

354

edited 5 hours ago

Mihai Chelaru

1837

edited 5 hours ago

Mihai Chelaru

1837

edited 5 hours ago

Mihai Chelaru

1837

asked 9 hours ago

Jan Musil

354

asked 9 hours ago

Jan Musil

354

asked 9 hours ago

Jan Musil

354

add a comment |

1 Answer
1

active

oldest

votes

Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.

If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .

To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.

edited 5 hours ago

answered 8 hours ago

Tim♦

61.9k9136234

1

$begingroup$
okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
$endgroup$
– Jan Musil
6 hours ago

$begingroup$
@JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
$endgroup$
– Tim♦
5 hours ago

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f412206%2fis-the-keras-embedding-layer-dependent-on-the-target-label%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .

edited 5 hours ago

answered 8 hours ago

Tim♦

61.9k9136234

1

$begingroup$
okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
$endgroup$
– Jan Musil
6 hours ago

$begingroup$
@JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
$endgroup$
– Tim♦
5 hours ago

add a comment |

If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .

edited 5 hours ago

answered 8 hours ago

Tim♦

61.9k9136234

1

$begingroup$
okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
$endgroup$
– Jan Musil
6 hours ago

$begingroup$
@JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
$endgroup$
– Tim♦
5 hours ago

add a comment |

If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .

edited 5 hours ago

answered 8 hours ago

Tim♦

61.9k9136234

If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .

edited 5 hours ago

answered 8 hours ago

Tim♦

61.9k9136234

edited 5 hours ago

answered 8 hours ago

Tim♦

61.9k9136234

answered 8 hours ago

Tim♦

61.9k9136234

answered 8 hours ago

Tim♦

61.9k9136234

1

$begingroup$
okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
$endgroup$
– Jan Musil
6 hours ago

$begingroup$
@JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
$endgroup$
– Tim♦
5 hours ago

add a comment |

1

$begingroup$
okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
$endgroup$
– Jan Musil
6 hours ago

$begingroup$
@JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
$endgroup$
– Tim♦
5 hours ago

okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?

– Jan Musil
6 hours ago

@JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.

– Tim♦
5 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

vs6 cLKleJ ZT RMDSWP45QN,p7ff7QuO SQV3iqmd5Mt

搜尋此網誌

Xjyuk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

1 Answer
1

1 Answer
1

1 Answer
1