Class ThrottledFetcher.ThrottledConnection

    • Constructor Summary

      Constructors 
      Constructor Description
      ThrottledConnection​(java.lang.String serverName, org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler connectionThrottler, int connectionTimeoutMilliseconds, int connectionLimit, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void beginFetch​(java.lang.String fetchType)
      Begin the fetch process.
      void close()
      Close the connection.
      void doneFetch​(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities)
      Done with the fetch.
      int executeFetch​(java.lang.String protocol, int port, java.lang.String urlPath, java.lang.String userAgent, java.lang.String from, java.lang.String lastETag, java.lang.String lastModified)
      Execute the fetch and get the return code.
      java.io.InputStream getResponseBodyStream()
      Get the response input stream.
      int getResponseCode()
      Get the http response code.
      java.lang.String getResponseHeader​(java.lang.String headerName)
      Get a specified response header, if it exists.
      void logFetchCount​(int count)
      Log the fetch of a number of bytes.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • serverName

        protected final java.lang.String serverName
        The server fqdn
      • connectionThrottler

        protected final org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler connectionThrottler
        The throttling object we use to track connections
      • fetchThrottler

        protected final org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottler fetchThrottler
        The throttling object we use to track fetches
      • connectionTimeoutMilliseconds

        protected final int connectionTimeoutMilliseconds
        Connection timeout in milliseconds
      • connectionManager

        protected final org.apache.http.conn.HttpClientConnectionManager connectionManager
        The client connection manager
      • httpClient

        protected final org.apache.http.client.HttpClient httpClient
        The httpclient
      • executeMethod

        protected org.apache.http.client.methods.HttpRequestBase executeMethod
        The method object
      • startFetchTime

        protected long startFetchTime
        The start-fetch time
      • throwable

        protected java.lang.Throwable throwable
        The error trace, if any
      • myUrl

        protected java.lang.String myUrl
        The current URL being fetched
      • statusCode

        protected int statusCode
        The status code fetched, if any
      • fetchType

        protected java.lang.String fetchType
        The kind of fetch we are doing
      • fetchCounter

        protected long fetchCounter
        The current bytes in the current fetch
      • threadStarted

        protected boolean threadStarted
        Set if thread has been started
    • Constructor Detail

      • ThrottledConnection

        public ThrottledConnection​(java.lang.String serverName,
                                   org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler connectionThrottler,
                                   int connectionTimeoutMilliseconds,
                                   int connectionLimit,
                                   java.lang.String proxyHost,
                                   int proxyPort,
                                   java.lang.String proxyAuthDomain,
                                   java.lang.String proxyAuthUsername,
                                   java.lang.String proxyAuthPassword,
                                   org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)
                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                   org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Constructor.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
    • Method Detail

      • beginFetch

        public void beginFetch​(java.lang.String fetchType)
                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                               org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Begin the fetch process.
        Specified by:
        beginFetch in interface IThrottledConnection
        Parameters:
        fetchType - is a short descriptive string describing the kind of fetch being requested. This is used solely for logging purposes.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • logFetchCount

        public void logFetchCount​(int count)
        Log the fetch of a number of bytes.
      • executeFetch

        public int executeFetch​(java.lang.String protocol,
                                int port,
                                java.lang.String urlPath,
                                java.lang.String userAgent,
                                java.lang.String from,
                                java.lang.String lastETag,
                                java.lang.String lastModified)
                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Execute the fetch and get the return code. This method uses the standard logging mechanism to keep track of the fetch attempt. It also signals the following three conditions: ServiceInterruption (if a dynamic error occurs), OK, or a static error code (for a condition where retry is not likely to be helpful). The actual HTTP error code is NOT returned by this method.
        Specified by:
        executeFetch in interface IThrottledConnection
        Parameters:
        protocol - is the protocol to use to perform the access, e.g. "http"
        port - is the port to use to perform the access, where -1 means "use the default"
        urlPath - is the path part of the url, e.g. "/robots.txt"
        userAgent - is the value of the userAgent header to use.
        from - is the value of the from header to use.
        lastETag - is the requested lastETag header value.
        lastModified - is the requested lastModified header value.
        Returns:
        the status code: success, static error, or dynamic error.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseCode

        public int getResponseCode()
                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                   org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get the http response code.
        Specified by:
        getResponseCode in interface IThrottledConnection
        Returns:
        the response code. This is either an HTTP response code, or one of the codes above.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseBodyStream

        public java.io.InputStream getResponseBodyStream()
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get the response input stream. It is the responsibility of the caller to close this stream when done.
        Specified by:
        getResponseBodyStream in interface IThrottledConnection
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseHeader

        public java.lang.String getResponseHeader​(java.lang.String headerName)
                                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                  org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get a specified response header, if it exists.
        Specified by:
        getResponseHeader in interface IThrottledConnection
        Parameters:
        headerName - is the name of the header.
        Returns:
        the header value, or null if it doesn't exist.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • doneFetch

        public void doneFetch​(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities)
                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Done with the fetch. Call this when the fetch has been completed. A log entry will be generated describing what was done.
        Specified by:
        doneFetch in interface IThrottledConnection
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • close

        public void close()
                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Close the connection. Call this to end this server connection.
        Specified by:
        close in interface IThrottledConnection
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException