Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

The Hungarian Job Assignment

Date: 03/10/2011 at 04:17:34
From: Agan
Subject: assignment problem.

I have n people and n machines for them to operate. Each person operates a
single machine and commands a different wage for operating that machine.

How can I assign the machines among my employees so that I have to pay
them the least money in total?

I have a solution from crossing different rows and arranging them in a
specific order, but I need to understand the process. I think I have to
consider all the possible combinations, which is difficult for even a
modest value of n.

This is a practical problem, but it looks like it has no easy answer.
Please help!



Date: 03/11/2011 at 03:21:18
From: Agan
Subject: assignment problem.

Sorry I didn't explain that properly. Let me try again with an example.

I have 3 engineers (A, B, and C) and 3 projects (P, Q, and R) to do. 

Here is the money each wants to undertake each project:

        P   Q   R
     +-----------
   A | 15  18  18
   B | 17  23  17
   C | 20  18  17 

As the manager of the engineers, what project should be given to whom to
minimize my total payroll?

How do I solve this scenario for larger and generalised cases?



Date: 03/13/2011 at 06:48:23
From: Doctor Jacques
Subject: Re: assignment problem.

Hi Agan,

This problem can be solved by linear programming, but there is a much more
efficient method, called the Hungarian algorithm.

We start with an n*n matrix with integer entries representing the costs of
the assignments (you can use negative costs to represent profits). The
goal is to select one entry from each row and column to make the total
cost minimal.

As an example, we will use the following cost matrix:

       1 2 3 4 5
     +----------
   A | 8 3 5 4 3
   B | 2 6 9 4 7
   C | 6 1 8 4 3
   D | 5 7 9 8 8
   E | 5 7 9 4 3

A first observation is that, since you must use one entry in each row, the
optimal assignment is not changed if you subtract the same number from all
the costs in a row. The total cost will not be the same, but once you have
found the optimal assignment, you can go back to the original table to
find the corresponding cost. Just keep track of the adjustments you make.

In this case, we subtract from each row the smallest cost of that row.
This gives the table:

       1 2 3 4 5
     +----------
   A | 5 0 2 1 0
   B | 0 4 7 2 5
   C | 5 0 7 3 2
   D | 0 2 4 3 3
   E | 2 4 6 1 0

We can now do the same thing with the columns:

       1 2 3 4 5
     +----------
   A | 5 0 0 0 0
   B | 0 4 5 1 5
   C | 5 0 5 2 2
   D | 0 2 2 2 3
   E | 2 4 4 0 0

By construction, this table only contains non-negative entries, and it
contains at least one 0 in each row and column. As stated before, the
optimal assignment for this table is the same as for the original table.
In particular, since any assignment in this table will have a non-negative
cost, an assignment that uses only cells containing 0 will be optimal, if
such an assignment exists.

The algorithm will consist in the repeated execution of the following
steps:

   1. Try to assign as many rows as possible using only the 0 cells.
   2. If we have assigned all the rows (5 in this case), we are done.
   3. Try to modify the assignments to assign more rows, without
      modifying the table. If this is successful, go back to step 2.
   4. Modify the table (in such a way that the optimal assignment is
      not changed) to create more 0 cells, possibly removing
      unassigned 0 cells (but not the assigned cells), and go back to
      step 2.

Before going into the details, I would like to outline the ideas involved.

In step 3, we try to assign more 0 cells without changing the table. Let
us assume that a row is currently not assigned. We know that this row
contains at least one 0, because of the initial steps. We could not assign
that 0, because there was already another 0 assigned in the same column.
We can try to displace that 0 to another column of the same row; this may
now create a conflict in the new column. We try to repeat the process
until we reach a currently unassigned column. If we end up in such a
column, we can move all the assignments in the path, which will allow us
to assign another 0.

In step 4, we will try to cover all the 0s with the smallest possible
number of rows and columns. As a result, the cells that are not covered
will all be strictly positive, and we can modify the table to create at
least one additional 0 in these cells.

It is a consequence of a theorem in Graph Theory that the smallest number
of rows and columns needed to cover all the 0s is equal to the largest
number of 0s that can be assigned together. The method I will describe is
actually based on some ideas used in the proof of that theorem, and the
nice thing is that the same computation will produce the data needed for
step 3 and step 4.

We start with step 1. In this case, we will simply look at each row in
turn, and assign the first 0 cell in that row that is not in a currently
assigned column, if there is such a cell. Marking them with asterisks,
this gives the following assignments:

       1  2  3  4  5
     +--------------
   A | 5  0* 0  0  0
   B | 0* 4  5  1  5
   C | 5  0  5  2  2
   D | 0  2  2  2  3
   E | 2  4  4  0* 0

If you solve a small problem by hand, you will probably use various
heuristics to get a better assignment; for example, you could start with
the rows that only contain one 0. I did not do that in this case, in order
to show you the general method while keeping the example small enough.

Currently, we have only assigned 3 cells, and we are not done yet. To
execute step 3, we will need to keep track of the cells we have visited;
this will allow us to retrace our steps if we find a way to change the
assignments and assign a new cell. In order to do that, we need to include
additional information in our table.

We attach to row 'i' a row tag rt(i). This field can contain the following
information:

     i) 0 (or blank) to represent the initial state.
    ii) -1 if the row is not assigned.
   iii) >0 if rt(i) contains the number of the column where a
           conflict was detected and caused the assignment of 
           this row to be changed.

We also attach to column 'j' a column tag ct(j) that contains the
following information:

     i) 0 (or blank) to represent the initial state.
    ii) >0 if ct(j) contains the number (I will use letters
           instead) of the row in which we tried to assign a cell 
           in this column.

We can also add to each row and column a flag to indicate that the
corresponding tag was modified during the last pass; this will save some
work (I do not show these flags here).

We have some general rules:

   * We make passes alternately on the row tags and the column tags
     (starting with the rows); in each pass, we look only at the tags
     that have been modified since the last pass.
   * We only change tags that are in their initial blank state; once
     a tag has been modified, it is never modified again in the same
     run of step 3.

Now, let us look at the details.

In a row pass, we look at all the row tags that have been modified since
the last pass. If the tag of row 'i' has been modified:

   * We look at all the unassigned 0s in row 'i' (there may be none).
   * If (i,j) is such a 0, and if ct(j) is blank, we let ct(j) = i
     and we mark ct(j) as modified.

When all rows have been processed, we turn to a column pass. In a column
pass, we look at the column tags that have been modified since the last
pass; if there is none, step 3 terminates unsuccessfully (this cannot
happen for row tags). If the tag of column 'j' has been modified:

   * We look at the assigned 0 in column 'j.' If there is none, we
     have found a way to assign one more cell. We do this by
     retracing all the operations that led to this column (using the
     row and column tags), and moving the assignments in the path.
     This will be explained in more detail using the example.
   * If, on the other hand, cell (i,j) contains an assigned 0, and if
     rt(i) is blank, we let rt(i) = j, and we mark rt(i) as modified.

When all columns have been processed, and if step 3 is not yet finished,
we go back to another row pass.

To allow you to follow the explanations, I will show here the table after
the execution of step 3:

       1  2  3  4  5  rt
     +---------------+--
   A | 5  0* 0  0  0 | 2
   B | 0* 4  5  1  5 | 1
   C | 5  0  5  2  2 |-1
   D | 0  2  2  2  3 |-1
   E | 2  4  4  0* 0 |
     +---------------+--
   ct| D  C  A  A  A |

We start by clearing all the row and column tags; we then let rt(i) = -1
for the currently unassigned rows (C and D).

In the first row pass, we look at the modified rows (C and D). Row C
contains an unassigned 0 in column 2: we let ct(2) = C (if ct(2) had not
been blank, we would have left it alone). In a similar way, row D contains
an unassigned 0 in column 1, and we let ct(1) = D.

We now start a column pass, by looking at the modified columns 1 and 2. In
column 1, we have an assigned 0 in row B; we therefore let rt(B) = 1. In
the same way, we let rt(A) = 2 because of the assigned 0 in A2.

We start another row pass, looking at the modified rows A and B. In row A,
we have unassigned 0s in A3, A4, and A5; accordingly, we let 
ct(3) = ct(4) = ct(5) = A. In row B, we have no unassigned 0, and we do
nothing.

We start a new column pass, on the modified columns 3, 4, and 5. In column
3, we have no assigned 0. This means that we can use that column for a new
assignment. We can now retrace our steps:

   * We are in column 3. rt(3) = A, which means that we can assign
     the cell A3, and we must look at row A.
   * We are now in row A. Since we have assigned a new cell in A3, we
     have a conflict. The position of the conflict is given by
     rt(A) = 2: we must remove the assignment in A2 and move to
     column 2.
   * Since we have removed the assignment in A2, we must assign a new
     cell in column 2. The position of that cell is given by
     ct(2) = C. We assign cell C2 and move to row C.
   * Since rt(C) = -1, we are done.

We have changed assignments on the path A3 -> (A2) -> C2, where the
parentheses are used to identify removed assignments. The new table is as
follows (I have also included the results of the next execution of step 2):

       1  2  3  4  5  rt
     +---------------+--
   A | 5  0  0* 0  0 |
   B | 0* 4  5  1  5 | 1
   C | 5  0* 5  2  2 |
   D | 0  2  2  2  3 |-1
   E | 2  4  4  0* 0 |
     +---------------+--
   ct| D             |

We now have 4 assignments, and the job is not yet finished. We execute
step 3 again:

   * Row D is unassigned, and has a 0 in D1. We let ct(1) = D.
   * Column 1 has an assigned 0 in B1. We let rt(B) = 1.
   * Row B has no unassigned 0.

Since no column tags have been modified in the last row pass, step 3
terminates unsuccessfully. We now define a set of rows R and a set of
columns C in the following way:

   * R contains the rows where rt(i) is blank; in this case,
     R = {A, C, E}
   * C contains the columns where ct(j) is not blank; in this case,
     C = {1}

We write r and c for the number of rows in R and columns in C,
respectively. We can now make a few observations, and it is possible to
prove them by thinking (hard) about what happened in step 3:

   * Each 0 (assigned or not) is in R or C (or both): R and C cover
     all 0s.
   * Each assigned 0 is in R or C, but not both. This confirms that
     r + c is equal to the number of assigned 0s (4 in this case), as
     stated before.

A consequence of this is that the uncovered cells (the intersections of
rows {B,C} and columns {2,3,4,5}) only contain strictly positive entries.
Let d be the smallest of these entries (in this case, d = 1, corresponding
to cell B4). We now modify the table as follows:

   * We subtract d from each uncovered cell. By the definition of d,
     these cells will remain non-negative, and at least one of them
     will become 0.
   * We add d to each cell that is covered twice (by a row in R and
     by a column in C). As the unmarked 0s are only covered once,
     this will not disturb them. Some of the unassigned (and
     therefore unused) 0s may disappear, however.

This modification does not change the optimal assignment, because it is
equivalent to subtracting d from the uncovered rows and adding d to the
covered columns. This results in a new table (in which I already included
the results of the next application of step 3):

       1  2  3  4  5  rt
     +---------------+--
   A | 6  0  0* 0  0 |
   B | 0* 3  4  0  4 | 1
   C | 6  0* 5  2  2 |
   D | 0  1  1  1  2 |-1
   E | 3  4  4  0* 0 | 4
     +---------------+--
   ct| D        B  E |

As the assigned cells have not been modified, we can keep them assigned;
there is no need to execute step 1 again.

We see that we have created a new 0 in B4; however, we cannot use it
directly, because of the cells B1 and E4. We execute step 3 again. I will
let you do that; the results are in the table.

Step 3 ends in column 5, where there is no assigned 0. This means that we
can improve our assignment, using this path:

  E5 -> (E4) -> B4 -> (B1) -> D1

Though it is not complete, this is the final assignment:

       1  2  3  4  5
     +---------------
   A | 6  0  0* 0  0
   B | 0  3  4  0* 4
   C | 6  0* 5  2  2
   D | 0* 1  1  1  2
   E | 3  4  4  0  0*

Going back to the original table, we find that the total cost of this
assignment is:

   5 + 4 + 1 + 5 + 3 = 18

We could also have kept track of the adjustments in the course of the
algorithm:

   * The row minima were 3, 2, 1, 5, 3, for a total of 14
   * The column minima (in the second step) were 0, 0, 2, 1, 0, for a
     total of 3
   * In step 4, we subtracted 1 from two rows (B and D) and added 1
     to one column (1), for a total decrease of 1.

This confirms that the total cost was decreased by 14 + 3 + 1 = 18; as the
total cost is 0 in the final table, the original cost was 18.

Note that, after step 4, step 3 allowed us to assign one more cell. This
is not always the case. It may happen that you have to execute step 4 more
than once before you can assign a new cell; the results will be different,
because the table is modified each time you execute step 4. However, the
algorithm will always terminate after a finite number of steps.

To see this, notice that each time we execute step 2, either we assign a
new cell (which can only happen a finite number of times), or we execute
step 4. In step 4, we subtract d from (n - r)(n - c) cells, and we add d
to r*c cells; the total of the table is thus increased by:

   d(rc - (n - r)(n - c)) = dn(r + c - n)

As noted before, r + c is equal to the total number of assigned cells, and
this is less than n (otherwise we would have finished). This shows that
the expression above is negative; the total decreases each time you
execute step 4; as the total must be non-negative, this also can only
happen a finite number of times.

Please feel free to write back if you require further assistance.

- Doctor Jacques, The Math Forum
  http://mathforum.org/dr.math/ 



Date: 03/15/2011 at 01:03:17
From: Agan
Subject: Thank you (assignment problem.)

Thank you for your detailed answer! Thank you, again!
Associated Topics:
College Algorithms
College Discrete Math
High School Discrete Mathematics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/