The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Weighting Cosine Similarity
Replies: 3   Last Post: Apr 16, 2007 12:40 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Ray Koopman

Posts: 3,383
Registered: 12/7/04
Re: Weighting Cosine Similarity
Posted: Apr 16, 2007 12:40 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Laurent Haan wrote:
> On Apr 16, 10:45 am, Jussi Piitulainen <>
> wrote:

>> Laurent Haan writes:
>>> I'm having problems modifying the formula for the cosine similarity
>>> to take into accounts weights given to the components of the

>> ...

>>> I'll illustrate the problem using the euclidean distance: I have a
>>> certain number of vectors and a query vector and I want to return
>>> the vector that minimizes the euclidean distance to the query
>>> vector.

>>> Vector 1: [0.5, 0.5, 1]
>>> Vector 2: [1, 0, 0.5]

>>> Query Vector : [1, 0.5, 0.5]
>>> Distance 1: 0.5 + 0 + 0.5 = 1
>>> Distance 2: 0 + 0.5 + 0 = 0.5

>>> Output: Vector 2
>>> [/code]

>> That's not Euclidean distance. That's block distance. Euclidean
>> distance is the square root of the sum of squared differences.

>>> I want to give each component an importance/weight. I've chosen
>>> values between [1, 10] since that allows me to immediatly modify the
>>> euclidean distance formula to take into account the weight:

>>> dist = sum(weight(i) * abs(x(i) - y(i)))
>> ...

>>> What I can't figure out is to how to express the exact same thing
>>> with the cosine similarity. I tried modifying the formula in several
>>> ways, but each try failed.

>> I wonder why you want to do this. If cosine does not work for you and
>> some other formula does, you could just use the other formula.
>> However, here's a couple of thoughts, don't know how valuable.
>> You were able to do your weighting with block distance because you had
>> access to something like individual components of the total distance.
>> Cosine is the dot product of normalized vectors. Normalize first: the
>> component x_k of vector x becomes x_k/length(x), where length(x) is
>> Euclidean, that is, square root of sum of squares. Then the cosine is
>> the sum of componentwise products, which you could weight, just like
>> block distance was the sum of componentwise differences.
>> Alternatively, how about separate cosines for important and
>> unimportant components, and then weighted average of those?

> Thank you for your answer, it already brought me closer to the goal.
> There is only one problem left that I need to solve to get the correct
> result:
> In the block distance (thanks for the correction), it was logical to
> me that increasing the difference between two components would
> increase its importance, which means that the higher the importance,
> the bigger the number I would multiply the difference with.
> This doesn't work with the cosine similarity. This is also my last
> question, which probably is also the hardest:
> How should the components look like in the importance vector? Does a
> bigger number automatically mean that this term has a higher
> importance than another? At the moment, I construct a vector with
> values between [1, 10] with 10 being the highest importance and I
> normalize that vector. Then I multiply each component of that vector
> to the componentwise products like you explained. Unfortunately, the
> result is not convincing. I never achieve a perfect similarity of 1,
> even if the two vectors are the same.

The normalization must also be weighted. For vectors u and v,
with weight vector w, the weighted cosine is

(sum w[i]*u[i]*v[i]) / sqrt[(sum w[i]*u[i]^2)*(sum w[i]*v[i]^2)].

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.