Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.math.mathematica

Topic: Extracting Information from XBRL Files
Replies: 3   Last Post: Jun 29, 2012 4:52 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
hmichel@cox.net

Posts: 122
Registered: 1/29/05
Re: Extracting Information from XBRL Files
Posted: Jun 29, 2012 4:52 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Greg:

Try

processSECDEF14A[ticker_] :=
Module[{cik, paddedCIK, urlfullpath, searchResults, textOnlylinks,
top1linkfromList, formDEF14A}, cik = FinancialData[ticker , "CIK"];
paddedCIK = IntegerString[ToExpression[cik], 10, 10];
urlfullpath =
"http://www.sec.gov/cgi-bin/srch-edgar?text=CIK%3D" <> paddedCIK <>
"+TYPE%3DDEF&first=1994&last12";
searchResults = Import[urlfullpath, "Hyperlinks"];
textOnlylinks =
Select[searchResults, Function[StringMatchQ[#, "*.txt"] == True]];
top1linkfromList = First[textOnlylinks];
formDEF14A = Import[top1linkfromList, "Plaintext"];
Return[formDEF14A];
];

The following Module will return raw text file from the SEC website for the
last available Form DEF 14A.

This file contains SGML, HTML and uuenconde jpgs or gifs or pdfs.

I tried
processSECDEF14A["GE"]
and
processSECDEF14A["MSFT"]
with success.

The code need to be tighter for when cases fail. And Parsing the HTML
between <TEXT> </TEXT> tags is left for review.

The bulk of the data mining for key terms such as "Summary Compensation" and
look for the HTML Tables near those terms.

Hans
-----Original Message-----
From: Gregory Lypny [mailto:gregory.lypny@videotron.ca]
Sent: Wednesday, June 27, 2012 10:13 AM
To: MathGroup; Hans Michel
Subject: Re: Extracting Information from XBRL Files

Thanks Hans,

I'm just flying by the seat of my pants. I will try your suggestion. I
need compensation tables from the DEF 14A. I spoke to a SEC representative
yesterday, and she told me that DEF 14A is not yet available in XBRL format.
I like your download-to-notebook-format idea.

Thanks once again,

Gregory



On Wed, Jun 27, 2012, at 10:19 AM, Hans Michel wrote:

> Gregory:
>
> I have used Mathematica to extract data from the SEC. (Mostly the
> older EDGAR format).
>
> Not all of the data on the SEC website is available in XBRL format.
>
> For some forms I prefer the EDGAR SGML-XML-HTML-Text hybrid fixed
> schema and taxonomy without the PDF.
>
> The XBRL structure brings so much framing with it that parsing the
> core xml file in Mathematica should be straight forward. But attaching
> the associates schemas and definitions are not so easy.
>
> The SEC provide a RSS feed of interactive data. Mathematica can take
> an RSS fee and change it to Notebook format.
>
> Import["http://www.sec.gov/Archives/edgar/xbrlrss.all.xml", "RSS"]
>
> http://xbrl.sec.gov/
>
> With that note book format you can write code to download and extract
> the zip files source.
>
> Parsing the main XML data file in a XBRL file is straight forward. It
> is attaching the schema and the meaning which could be done in
> Mathematica but to do so one would have to have a compeling reason not
> to use other tools that are specifically made for such task.
>
> I am familiar with SGML data I would consider the XBRL format a hybrid
> of SGML-XML even though the use of schemas (DTD) and entity files etc.
>
> What are you trying to do?
>
> Hans
>
> -----Original Message-----
> From: Gregory Lypny [mailto:gregory.lypny@videotron.ca]
> Sent: Wednesday, June 27, 2012 3:09 AM
> To: mathgroup@smc.vnet.net
> Subject: Extracting Information from XBRL Files
>
> Hello everyone,
>
> This is a long shot, but has anyone used Mathematica to parse XBRL
> files, such as those accessible from the SEC's (US Securities and
> Exchange
> Commission) EDGAR system? XBRL is a tagged format, an offshoot of XML.
>
> Gregory Lypny
>
>






Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.