Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: reverse engineering principal components...
Replies: 0

 Richard Palmer Posts: 46 Registered: 7/31/06
reverse engineering principal components...
Posted: Oct 12, 2012 12:09 AM

I would like to be able to take a large dataset, compute the principal
components on a sufficient subset, and use the results to compute principal
components on the remaining observations. So far, I haven't been able to
figure out how it is done. Here is sample code (computed as a notebook
expression). Can anyone tell me where I am going wrong?

Notebook[{

Cell[CellGroupData[{
Cell["Reverse Engineering Principal Components", "Section",
CellChangeTimes->{{3.558966707926651*^9, 3.5589667244925985`*^9}}],

Cell["\<\
make a table of data and a table of the principal components using \
the Correllation method. Check to see that they have the requisite \
properties\
\>", "Text",
CellChangeTimes->{{3.5589667304749403`*^9, 3.558966775500516*^9}, {
3.558966830268648*^9, 3.5589668439964333`*^9}}],

Cell[CellGroupData[{

Cell[BoxData[{
RowBox[{
RowBox[{"t", "=",
RowBox[{"Table", "[",
RowBox[{
RowBox[{"RandomReal", "[", "]"}], ",",
RowBox[{"{", "5", "}"}], ",",
RowBox[{"{", "3", "}"}]}], "]"}]}], ";"}], "\n",
RowBox[{
RowBox[{
RowBox[{"princomponentst", "=",
RowBox[{"PrincipalComponents", "[",
RowBox[{"t", ",",
RowBox[{"Method", "\[Rule]", "\"\<Correlation\>\""}]}], "]"}]}],
";"}], " "}], "\n",
RowBox[{"Print", "[",
RowBox[{"\"\<The mean of the set is \>\"", ",",
RowBox[{
RowBox[{"Mean", "[", "princt", "]"}], "//", "Chop"}]}],
"]"}], "\n",
RowBox[{"Print", "[",
RowBox[{"\"\<The variance of the set is \>\"", ",",
RowBox[{"Variance", "[", "princt", "]"}]}], "]"}]}], "Input",
CellChangeTimes->{{3.558943696585477*^9, 3.5589437436731706`*^9}, {
3.5589448723167253`*^9, 3.558944889147688*^9},
3.5589452714125524`*^9, {3.558965740525318*^9,
3.558965743957515*^9}, 3.558966338325511*^9, {
3.5589667822939043`*^9, 3.558966817373911*^9}, {
3.558966862013464*^9, 3.5589669321494756`*^9}, {
3.558967927711418*^9, 3.558967942303253*^9}}],

Cell[CellGroupData[{

Cell[BoxData[
InterpretationBox[
RowBox[{"\<\"The mean of the set is \"\>", "\[InvisibleSpace]",
RowBox[{"{",
RowBox[{"0", ",", "0", ",", "0"}], "}"}]}],
SequenceForm["The mean of the set is ", {0, 0, 0}],
Editable->False]], "Print",
CellChangeTimes->{{3.558966925069071*^9, 3.558966932823514*^9}, {
3.558967933532751*^9, 3.5589679478705716`*^9}}],

Cell[BoxData[
InterpretationBox[
RowBox[{"\<\"The variance of the set is \"\>", "\[InvisibleSpace]",
RowBox[{"{",
RowBox[{
"1.4974734615741159`", ",", "0.9657686960146733`", ",",
"0.5367578424112112`"}], "}"}]}],
SequenceForm[
"The variance of the set is ", {1.4974734615741159`,
0.9657686960146733, 0.5367578424112112}],
Editable->False]], "Print",
CellChangeTimes->{{3.558966925069071*^9, 3.558966932823514*^9}, {
3.558967933532751*^9, 3.558967947872572*^9}}]
}, Open ]]
}, Open ]],

Cell["\<\
Standardize the observations and compute a correlation matrix. \
Compute the eigenvectors.\
\>", "Text",
CellChangeTimes->{{3.558966974924922*^9, 3.558967006484727*^9}}],

Cell[BoxData[{
RowBox[{
RowBox[{"standardizet", "=",
RowBox[{"Standardize", "[", "t", "]"}]}], ";"}], "\n",
RowBox[{
RowBox[{
RowBox[{"corrt", "=",
RowBox[{"Correlation", "[", "standardizet", "]"}]}], ";"}],
" "}], "\n",
RowBox[{
RowBox[{
RowBox[{"eigenvectors", "=",
RowBox[{"Eigenvectors", "[", "corrt", "]"}]}], ";"}],
" "}]}], "Input",
CellChangeTimes->{{3.5589449260758*^9, 3.5589449510202265`*^9}, {
3.5589454403162127`*^9, 3.5589454525239115`*^9},
3.5589670144291816`*^9, 3.5589670498292065`*^9, {
3.5589711280694685`*^9, 3.5589711551900196`*^9}}],

Cell["\<\
I think this is the multiplication. However, the variances are not \
correct since they do not decrease.\
\>", "Text",
CellChangeTimes->{{3.5589677050376825`*^9, 3.5589677222526665`*^9}, {
3.558971033573064*^9, 3.558971044228673*^9}, {3.558971199412549*^9,
3.558971208188051*^9}}],

Cell[CellGroupData[{

Cell[BoxData[{
RowBox[{
RowBox[{"mypc2", "=",
RowBox[{"standardizet", ".", "eigenvectors"}]}], ";"}], "\n",
RowBox[{"Mean", "[", "mypc2", "]"}], "\n",
RowBox[{"Variance", "[", "mypc2", "]"}]}], "Input",
CellChangeTimes->{{3.5589661581992083`*^9, 3.5589662056929245`*^9},
3.5589670948777833`*^9, 3.5589671263245816`*^9, {
3.558967191557313*^9, 3.558967192101344*^9}, {
3.5589677375095396`*^9, 3.558967776636778*^9},
3.5589710607416177`*^9}],

Cell[BoxData[
RowBox[{"{",
RowBox[{
RowBox[{"-", "2.4424906541753446`*^-16"}], ",",
"3.108624468950438`*^-16", ",",
RowBox[{"-", "3.7192471324942745`*^-16"}]}], "}"}]], "Output",
CellChangeTimes->{{3.55897113613293*^9, 3.5589711629244623`*^9}}],

Cell[BoxData[
RowBox[{"{",
RowBox[{
"1.1977733239835728`", ",", "0.7727628961600694`", ",",
"1.0294637798563568`"}], "}"}]], "Output",
CellChangeTimes->{{3.55897113613293*^9, 3.558971162927462*^9}}]
}, Open ]]
}, Open ]]
},
WindowSize->{707, 787},
WindowMargins->{{Automatic, 228}, {49, Automatic}},
ShowSelection->True,
FrontEndVersion->"8.0 for Microsoft Windows (64-bit) (October 6, \
2011)",
StyleDefinitions->"Default.nb"
]

--
Richard Palmer

Home 941 412 8828
Cell 508 982-7266