Results 1 to 13 of 13

Thread: Help with Grabing data (stock prices) and having them update on a regular basis

  1. #1
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default Help with Grabing data (stock prices) and having them update on a regular basis

    Hi, I currently have an assignment that requires me to track the prices of stocks. Was wondering if anyone knows how to setup a google live document which will automatically capture the prices and update it like every few minutes?

    Would appreciate it if someone could give me a hint or pointer. Currently looking it up on google, but I may be searching for the wrong keyword or something. I tried using Excel, but then that does not update by the minute.

    Many thanks in advance

  2. #2
    Join Date
    Jan 2012
    Posts
    2,568
    Mentioned
    35 Post(s)
    Quoted
    356 Post(s)

    Default

    Use Simba.
    GetPage to get the html code of the webpage, then string commands like Between, Pos, Copy etc to get the price.

  3. #3
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default

    Sorry, I am not really good at writing the commands, do you have an example which I can try to work from?

    Basically I found out that google docs can scrape websites by using =importxml. However, I can't seem to get the function right. I thing Bloomberg would be a better source, so is anyone able to help me grab a few things from this site

    http://www.bloomberg.com/quote/GENT:MK

    Would need the current price, which is right next to Gent:MK, the big price there, as well as

    Open:
    Previous Close:
    Day's range.

    Would really be helpful if someone could help me code it properly. I have tried youtube, and followed but somehow it is still not right :S
    Last edited by newb cheater; 09-19-2012 at 03:14 PM.

  4. #4
    Join Date
    Jul 2007
    Location
    Finland
    Posts
    304
    Mentioned
    0 Post(s)
    Quoted
    0 Post(s)

    Default

    =ImportHtml("http://www.bloomberg.com/quote/GENT:MK";"table";4)
    Add that in google docs

  5. #5
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default

    You my friend, ARE THE MAN!!!!!!!! Btw, is it possible to grab the price as well, the one on top next to the name of the stock?

    However, may i know how you get to know that it is table;4? I mean i used inspect element but how do i know what to type into the command.

    And, what about this link

    https://charttb.asiaebroker.com/ebcS...y=%33%31%38%32

    Do you think it is possible to do the same? Sorry if this is taking too much of your time. But you are a lifesaver
    Last edited by newb cheater; 09-19-2012 at 03:41 PM.

  6. #6
    Join Date
    Jul 2007
    Location
    Finland
    Posts
    304
    Mentioned
    0 Post(s)
    Quoted
    0 Post(s)

    Default

    =ImportXml("http://www.bloomberg.com/quote/GENT:MK";"//span[@class=' price']")
    Never used google docs before

  7. #7
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default

    My function for the one above was similiar, but wrong lol. How do you know what to type. Is there a way to learn, and is it possible to grab the whole table from this as well?

    https://charttb.asiaebroker.com/ebcS...y=%33%31%38%32

    Sorry I am a business student and have little to no experience in terms of programming language

  8. #8
    Join Date
    Jul 2007
    Location
    Finland
    Posts
    304
    Mentioned
    0 Post(s)
    Quoted
    0 Post(s)

    Default

    Quote Originally Posted by newb cheater View Post
    My function for the one above was similiar, but wrong lol. How do you know what to type. Is there a way to learn, and is it possible to grab the whole table from this as well?

    https://charttb.asiaebroker.com/ebcS...y=%33%31%38%32

    Sorry I am a business student and have little to no experience in terms of programming language
    =ImportHtml("https://charttb.asiaebroker.com/ebcServlet/stkFastQuote?bhcode=058&key=3182";"table";0)

  9. #9
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default

    Thank you so much! You have made my tracking much easier. However, if i may, how do you know how to string the function. Do you use inspect element of how do u know?

  10. #10
    Join Date
    Jul 2007
    Location
    Finland
    Posts
    304
    Mentioned
    0 Post(s)
    Quoted
    0 Post(s)

    Default

    Quote Originally Posted by newb cheater View Post
    Thank you so much! You have made my tracking much easier. However, if i may, how do you know how to string the function. Do you use inspect element of how do u know?
    First i searched how that can be inputed in google docs.
    After googling i found importxml and figured out that it used something called xpath so back to googling.
    Then i opened source code of webpage:
    Code:
          <span class=" price">
            9.020
            
                <span> MYR</span>
            
          </span>
    it was inside span and because span is popular building block in html, i needed limit the search a little. so fired google and searched "xpath span class". copied the example from google to googledocs and changed name of class

  11. #11
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default

    I see, will try to read up when I have time, but thank you so much for your time and help. Appreciate it loads!

    Cheers!!

  12. #12
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default

    If you want it straight to your desktop or saved to a file, Grab LibCurl, Boost, and a C++ Compiler, compile this.. Friend and I wrote it a couple years ago.. Still works but a bit buggy. Currently it outputs like this:



    C++ Code:
    #include <curl/curl.h>
    #include <boost/algorithm/string.hpp>
    #include <iostream>
    #include <string>
    #include <fstream>
    #include <map>
    #include <vector>
    #include <boost/regex.hpp>

    using namespace std;

    //Variables..
    string DataHolding;

    //Function-Prototypes..
    string HtmlDecode(string str);
    void removeBadURLs(vector<string> &vec);
    void GrabInfo(string url, short &result);
    void removeDuplicates(vector<string> &vec);
    void CheckURLs(vector<string> &urllist, size_t arraysize);
    void preg_match_all(string Source, boost::regex &expression, string &ID);
    static size_t WriteBuffer(void *contents, size_t size, size_t nmemb, void *userp);
    static size_t strpos(string Data, string Regex, int pos, int SizeOf_Regex, int additional);

    //Create a struct to hold the data..
    struct MemoryStruct {
      char *memory;
      size_t size;
    };

    int main()
    {
       ifstream file;
       string line, hsmatch, scmatch, mmatch;
       file.open("Stock.ini");

       boost::regex hashcomment("((^|(\\s*))|(^(\\s*)))#(.*)$");           //Comment using #..
       boost::regex scomment("((^|(\\s*))|(^(\\s*)))//(.*)$");             //Comment using //..
       boost::regex mcomment("((^|(\\s*))|(^(\\s*)))(/\\*)(.*)(\\*/)$");

       vector<string>urls;                      //Create Vector to hold all Links..
       while(getline(file, line))
       {
           preg_match_all(line, hashcomment, hsmatch);
           preg_match_all(line, scomment, scmatch);
           preg_match_all(line, mcomment, mmatch);

           if((line != hsmatch) && (line != scmatch) && (line != mmatch))
                urls.push_back(line);               //Add Url To Vector..
       }
       file.close();

       if(urls.size() == 0)
       {
           cout<<"The File Is Empty! Please populate it with valid URLs.\n\n";
           cout<<"This program will now Terminate in 5 seconds..\n\n\n";
           Sleep(5000);
           return 0;
       }
       removeDuplicates(urls);

        size_t urlsize = urls.size();
        //CheckURLs(urls, urlsize);               //Validate URLs..

      while(1)
      {
        Sleep(2000);
        #ifdef _WIN32
          std::system ("CLS");
        #else
          std::system ("clear");
        #endif

        for(unsigned short i = 0; i < urls.size(); i++)
        {
            short result;
              GrabInfo(urls[i], result);
              DataHolding = HtmlDecode(DataHolding);                                                    //Strip HTML Special Chars..

              try
              {
                  size_t Start, End;
                                                /** Get Stock Names **/

                  boost::regex SnExpression("<[a-z]+ class=\"wsod_smallSubHeading\"", boost::regex::icase);
                  boost::regex SxExpression("<h1 class=\"wsod_fLeft(.*)\" style=\"margin-top:6px;\">", boost::regex::icase);
                  string StockID, StockX;
                  preg_match_all(DataHolding, SnExpression, StockID);
                  preg_match_all(DataHolding, SxExpression, StockX);

                  Start = strpos(DataHolding, StockX, 0, StockX.size(), StockX.size());
                  End = strpos(DataHolding, StockID, Start, StockID.size(), -1);
                  string Final = DataHolding.substr(Start, End-Start);     //From the Start Pos, Copy Everything Until the End Pos to a string..
                  string urlnames[urlsize];
                  urlnames[i] = Final + ":  ";

                                                /** Get Stock Values **/

                  boost::regex SvExpression("<span stream=\"last_[0-9]+\" streamFormat=\"ToHundredth\" streamFeed=\"[A-Z]+\">", boost::regex::icase);
                  preg_match_all(DataHolding, SvExpression, StockID);

                  Start = strpos(DataHolding, StockID, 0, StockID.size(), StockID.size());
                  End = strpos(DataHolding, "</span>", Start, 7, 0);
                  Final = DataHolding.substr(Start, End-Start);

                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 3);          //Console Colours..
                  cout<<urlnames[i];
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 7);
                  cout<<Final<<"\n\n";
              }
              catch(exception &e)
              {
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 15);
                  cout<<"\n\n\n--------------------------------------------------------------------------------";
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 12);
                  cout<<"\n\nException.. Html File is empty -- substring Out of Range! Details:\n\n";
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 15);
                  cout<<"\n\n\n";
              }

              SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 8);
        }
      }
      return 0;
    }

    string HtmlDecode(string str)
    {
        string subs[] = {"& #34;", "&quot;", "& #39;", "&apos;", "& #38;", "&amp;",
        "& #60;", "&lt;", "& #62;", "&gt;", "&34;", "&39;", "&38;", "&60;", "&62;"};

        string reps[] = {"\"", "\"", "'", "'", "&", "&", "<", "<", ">", ">", "\"",
        "'", "&", "<", ">"};

        size_t found;
        for(int i = 0; i < 15; i++)
        {
            do
            {
                found = str.find(subs[i]);
                if (found != string::npos)
                   str.replace (found,subs[i].length(),reps[i]);
            } while (found != string::npos);
        }

        return str;
    }

    static size_t WriteBuffer(void *contents, size_t size, size_t nmemb, void *userp)
    {
      size_t realsize = size *nmemb;
      struct MemoryStruct *mem = (struct MemoryStruct*)userp;

      mem->memory = (char*) realloc(mem->memory, mem->size + realsize + 1);
      if (mem->memory == NULL) {
        printf("Cannot Allocated Enough Memory (ReAlloc is NULL).\n");
        exit(EXIT_FAILURE);
      }

      memcpy(&(mem->memory[mem->size]), contents, realsize);
      mem->size += realsize;
      mem->memory[mem->size] = 0;

      return realsize;
    }

    void GrabInfo(string url, short &result)
    {
         CURL *curl_handle;
         CURLcode res;
         result = 0;

         struct MemoryStruct data;
           data.memory = (char*) malloc(1);
           data.size = 0;

           curl_global_init(CURL_GLOBAL_ALL);
           curl_handle = curl_easy_init();

          if(curl_handle)
          {
              curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());                        //URL To Grab..
              curl_easy_setopt(curl_handle, CURLOPT_FAILONERROR, true);                       //Incase of 400+ error, Don't return the page..
              curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteBuffer);              //Send Data to the Function..
              curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void*)&data);                 //Pass Struct Chunk to the Function..
              curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0");          //Use a UserAgent..
              res = curl_easy_perform(curl_handle);
              if(res)
              {
                  result = res;
              }
              curl_easy_cleanup(curl_handle);                                                   //Perform-Execute..
          }

          //cout<<"Size Of WebPage: "<< ((float)data.size/1000) <<" kb.\n\n";        //Print the SizeOf webpage in bytes..

          if(data.memory)
          {
            DataHolding = data.memory;                                                         //Write Data to a String..
            free(data.memory);
            data.memory = NULL;
          }

          curl_global_cleanup();
    }

    static size_t strpos(string Data, string Regex, int pos, int SizeOf_Regex, int additional)
    {
        size_t Found = 0;
        try
        {
            Found = Data.find(Regex.c_str(), pos, SizeOf_Regex) + additional;
        }
        catch(exception &e)
        {
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 15);
            cout<<"\n\n\n--------------------------------------------------------------------------------";
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 12);
            cout<<"\n\nException.. Html File is empty -- substring Out of Range! Details:\n\n";
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 15);
            cout<<"\n\n\n";
        }

        return Found;
    }

    void removeDuplicates(vector<string> &vec)
    {
       std::sort(vec.begin(), vec.end());
       vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
    }

    void removeBadURLs(vector<string> &vec)
    {
        cout<<"Would you like to remove the bad URL's permanently? (y / n): ";
        char response;
        cin>> response;
        cin.ignore();
        while(cin.fail())
        {
            cin.clear();
            cin.ignore(std::numeric_limits<int>::max(),'\n');
            cout<<"Invalid choice.. Please Try Again.\n\n";
            cout<<"Would you like to remove the bad URL's permanently? (y / n): ";
            cin>> response;
            cin.ignore();

            if(!cin.fail() || cin.good())
                break;
        }
        if(response == 'y' || response == 'Y')
        {
            ofstream file;
            file.open("Stock.ini");
                file.clear();
            file.close();

            file.open("Stock.ini", ios::app);
            for(unsigned short i = 0; i < vec.size(); i++)
                file<< vec[i]<<endl;
            file.close();
        }
    }

    void preg_match_all(string Source, boost::regex &expression, string &ID)
    {
           try
           {
               std::string::const_iterator start, end;
               start = Source.begin();
               end = Source.end();
               boost::smatch what;
               boost::match_flag_type flags = boost::match_default;

               while(boost::regex_search(start, end, what, expression, flags))
               {
                    //Destination = boost::regex_replace(Source, expression, "");
                    ID = what[0];
                    start = what[0].second;
               }
           }
           catch(exception &e)
           {
               cout<<"Exception Caught.. Function: preg_match_all.\n\n";
           }

        return;
    }

    void CheckURLs(vector<string> &urllist, size_t arraysize)
    {
        boost::regex MarketUrl("(http|https)://([a-z]+.)*(/[a-z]+/)*/markets/[a-z]+/", boost::regex::icase);
        boost::regex MarketUrlEx("(http|https)://([a-z]+.)*(/[a-z]+/)*/markets/[a-z]+/(\\?([a-z]+)=H_MKT_Data)", boost::regex::icase);
        boost::regex QuoteUrl("(http|https)://([a-z]+.)*(/[a-z]+/)*/quote.([a-z]+)(\\?([a-z]+)=[a-z]+)", boost::regex::icase);
        bool badurls = false;

        cout<<"Checking for invalid URLs. Please wait..\n\n";

        for(unsigned short i = 0; i < urllist.size(); i++)
        {
            try
            {

               if(!boost::regex_match(urllist[i], MarketUrl) && !boost::regex_match(urllist[i], QuoteUrl) && !boost::regex_match(urllist[i], MarketUrlEx))
               {
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 12);
                  cout<<"Bad Url Format Found.. \n";
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 15);
                  cout<<"\tBad-URL: "<<urllist[i];
                  SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 2);
                  cout<<"      ----------      URL Temporarily Removed!\n\n";
                  badurls = true;
                  urllist.erase(urllist.begin() + i);
                  i = 0;
               }
               else if(boost::regex_match(urllist[i], MarketUrl) || boost::regex_match(urllist[i], QuoteUrl) || boost::regex_match(urllist[i], MarketUrlEx))
               {
                   short result;
                    GrabInfo(urllist[i], result);

                    boost::regex LostStock("<title>Symbol not found Stock quote - CNNMoney.com</title>", boost::regex::icase);
                    string QuoteLost, LostLink, LostServer;
                    preg_match_all(DataHolding, LostStock, QuoteLost);

                    boost::regex BadLink("<span class=\"breadcrumbmain\">404 Page Not Found</span>", boost::regex::icase);
                    preg_match_all(DataHolding, BadLink, LostLink);

                    boost::regex ServerLost("<h1 id=\"errorTitleText\">Server not found</h1>", boost::regex::icase);
                    preg_match_all(DataHolding, ServerLost, LostServer);

                    if((LostServer.length() != 0) || (LostLink.length() != 0) || (QuoteLost.length() != 0) || (result != 0))
                    {
                        SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 12);
                        cout<<"Bad Url Found.. \n";
                        SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 15);
                        cout<<"\tBad-URL: "<<urllist[i]<<"\n";
                        SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), 2);
                        badurls = true;
                        urllist.erase(urllist.begin() + i);
                        i = 0;
                    }
               }
            }
            catch(exception &e)
            {
                cout<<"Cannot Match URL Complexity Of Regex for Matching Exceeds it's limits.\n\n";
            }
        }

        if(badurls == true)
            removeBadURLs(urllist);
    }


    Stock.INI:

    Code:
    http://money.cnn.com/data/markets/dow/?iid=H_MKT_Data
    http://money.cnn.com/data/markets/nasdaq/?iid=H_MKT_Data
    http://money.cnn.com/data/markets/sandp/
    http://money.cnn.com/quote/quote.html?symb=AAPL
    http://money.cnn.com/quote/quote.html?symb=C
    http://money.cnn.com/quote/quote.html?symb=F
    http://money.cnn.com/quote/quote.html?symb=GE
    http://money.cnn.com/quote/quote.html?symb=GOOG
    Last edited by Brandon; 09-19-2012 at 04:15 PM.
    I am Ggzz..
    Hackintosher

  13. #13
    Join Date
    Mar 2007
    Posts
    378
    Mentioned
    0 Post(s)
    Quoted
    8 Post(s)

    Default

    Thanks for all the help guys. Definitely appreciate it.

    Btw, Brandon, the code is to be run on a C++ compiler? and I must have all 3, LibCurl, Boost, and a C++ Compiler?

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •