Results 1 to 7 of 7

Thread: GetHTTPPage can't handle redirects?

  1. #1
    Join Date
    Dec 2011
    Location
    East Coast, USA
    Posts
    4,231
    Mentioned
    112 Post(s)
    Quoted
    1869 Post(s)

    Default GetHTTPPage can't handle redirects?

    (I guess this is the correct section for this.)

    Mmkay, my problem lies in the way GetHTTPPage ... uh ... gets a http page.

    For whatever reason, it can't handle a URL that redirects.

    For instance, the normal URL for my powerbot profile page is https://www.powerbot.org/community/u...8-keepbotting/

    The way IP.Board's forum software works, you can remove the text in the last parameter of the URL, like so: https://www.powerbot.org/community/user/446368-/
    Notice the text "keepbotting" is gone from the last portion of the URL.

    When entering this into a normal browser, the incomplete link will automatically fetch the complete link and redirect my browser.

    GetHTTPPage apparently can't compensate for this. It returns the string of HTML from the redirect page (which is nothing useful for me).

    Any way I can get around this?
    GitLab projects | Simba 1.4 | Find me on IRC or Discord | ScapeRune scripts | Come play bot ScapeRune!

    <BenLand100> we're just in the transitional phase where society reclassifies guns as Badâ„¢ before everyone gets laser pistols

  2. #2
    Join Date
    Jun 2012
    Posts
    4,867
    Mentioned
    74 Post(s)
    Quoted
    1663 Post(s)

    Default

    I don't think there is, but if https://www.powerbot.org/community/u...8-keepbotting/ is the direct link why not link to there?

  3. #3
    Join Date
    Dec 2011
    Location
    East Coast, USA
    Posts
    4,231
    Mentioned
    112 Post(s)
    Quoted
    1869 Post(s)

    Default

    Quote Originally Posted by BMWxi View Post
    I don't think there is, but if https://www.powerbot.org/community/u...8-keepbotting/ is the direct link why not link to there?
    That wouldn't normally be an issue. That's the way I'd do it in practice.
    I'm just screwing around with crawling web pages in Simba.

    I decided to try and make a script that would pull data off powerbot members' profile pages and organize it.
    I figured the easiest way to do that would be to take the base URL for profiles (https://www.powerbot.org/community/user-XXXXXX/) and let the script fill in the member IDs. This would work because members IDs are completely ordinal.

    Since GetHTTPPage can't follow redirect links, I either need a way around that, or a way to guess the username of each member ID.
    GitLab projects | Simba 1.4 | Find me on IRC or Discord | ScapeRune scripts | Come play bot ScapeRune!

    <BenLand100> we're just in the transitional phase where society reclassifies guns as Badâ„¢ before everyone gets laser pistols

  4. #4
    Join Date
    Jun 2012
    Posts
    4,867
    Mentioned
    74 Post(s)
    Quoted
    1663 Post(s)

    Default

    Quote Originally Posted by KeepBotting View Post
    That wouldn't normally be an issue. That's the way I'd do it in practice.
    I'm just screwing around with crawling web pages in Simba.

    I decided to try and make a script that would pull data off powerbot members' profile pages and organize it.
    I figured the easiest way to do that would be to take the base URL for profiles (https://www.powerbot.org/community/user-XXXXXX/) and let the script fill in the member IDs. This would work because members IDs are completely ordinal.

    Since GetHTTPPage can't follow redirect links, I either need a way around that, or a way to guess the username of each member ID.
    Hm, I see...

    I just tried and it looks like https isn't supported by getpage. Returns blank unless I remove the s.

    Unfortunately getpage returns blank on a link that gets redirected (for powerbot), some other sites show some information about where you're being redirected to.

    Edit: You could also try using APPA, that should handle the redirects fine. It will be slower though.

  5. #5
    Join Date
    Dec 2011
    Location
    Holland
    Posts
    545
    Mentioned
    0 Post(s)
    Quoted
    19 Post(s)

    Default

    If the HTML of the redirect page contains the URL it redirects to, you could extract it and use in a new getPage().

  6. #6
    Join Date
    Oct 2011
    Posts
    805
    Mentioned
    21 Post(s)
    Quoted
    152 Post(s)

    Default

    GetHTTPPage is not supposed to redirect, because it just gets a page. You can't compare it to browser, which has heavy libraries to handle all http-request headers.
    But you can do...

    Simba Code:
    program new;

    function GetHTTPPage2( client:integer; page:string) : string;
    var
     re :TRegExpr;
     header : string;
     pos:integer;
    begin
      Result := GetHTTPPage(client,page);
      header := GetRawHeaders(client);
      re.Init();
      re.setExpression('Location: ');

      if re.Exec(header) then
      begin
          pos := re.getMatchPos(0)+length(re.getExpression);
          re.setExpression('\S+');
          if re.ExecPos(pos) then
          begin
            writeln(re.getMatch(0) );
            Result := GetHTTPPage(client, re.getMatch(0));
          end else
          writeln('GetHTTPPage2: Spaces in url ?!');
      end;
    end;
    var c:integer;
    begin
       c := InitializeHTTPClient(true);
       writeln( GetHTTPPage2(c,'https://www.powerbot.org/community/user/446368-/') );
    end.

    My Simba doesn't work with https, so I'm getting blank page anyway.

  7. #7
    Join Date
    Jun 2012
    Posts
    4,867
    Mentioned
    74 Post(s)
    Quoted
    1663 Post(s)

    Default

    Quote Originally Posted by bg5 View Post
    GetHTTPPage is not supposed to redirect, because it just gets a page. You can't compare it to browser, which has heavy libraries to handle all http-request headers.
    But you can do...

    -snip-

    My Simba doesn't work with https, so I'm getting blank page anyway.
    If you want you can use just http, I tested with yours and it works.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •