Class ThrottledFetcher


  • public class ThrottledFetcher
    extends java.lang.Object
    This class uses httpclient to fetch stuff from webservers. However, it additionally controls the fetch rate in two ways: first, controlling the overall bandwidth used per server, and second, limiting the number of simultaneous open connections per server. It's also capable of limiting the maximum number of fetches per time period per server as well; however, this functionality is not strictly necessary at this time because the CF scheduler does that at a higher layer. An instance of this class would very probably need to have a lifetime consistent with the long-term nature of these values, and be static. This class sets up a different Http connection pool for each server, so that we can foist off onto the httpclient library the task of limiting the number of connections. This means that we need periodic polling to determine when idle pooled connections can be freed.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String _rcsid  
      protected static int globalHandleCount
      This counter keeps track of the total outstanding handles across everything, because we do try to control that
      protected static java.lang.Integer globalHandleCounterLock
      This is the lock object for that global handle counter
      protected static int READ_CHUNK_LENGTH
      The read chunk length
      protected static boolean recordEverything
      This flag determines whether we record everything to the disk, as a means of doing a web snapshot
      protected int refCount
      Reference count for how many connections to this pool there are
      protected java.util.Map<java.lang.String,​org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler> serverMap
      This hash maps the server string (without port) to a pool throttling object, where we can track the statistics and make sure we throttle appropriately
    • Constructor Summary

      Constructors 
      Constructor Description
      ThrottledFetcher()
      Constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      IThrottledConnection createConnection​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, java.lang.String throttleGroupName, java.lang.String serverName, int connectionLimit, int connectionTimeoutMilliseconds, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)
      Establish a connection to a specified URL.
      void noteConnectionEstablished()
      Note that there is a repository connection that is using this object.
      void noteConnectionReleased()
      Connection pool no longer needed.
      void poll()
      Poll.
      protected static void registerGlobalHandle​(int maxHandles)
      Note that we're about to need a handle (and make sure we have enough)
      protected static void releaseGlobalHandle()
      Note that we're done with a handle (so we can free it)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • recordEverything

        protected static final boolean recordEverything
        This flag determines whether we record everything to the disk, as a means of doing a web snapshot
        See Also:
        Constant Field Values
      • READ_CHUNK_LENGTH

        protected static final int READ_CHUNK_LENGTH
        The read chunk length
        See Also:
        Constant Field Values
      • globalHandleCount

        protected static int globalHandleCount
        This counter keeps track of the total outstanding handles across everything, because we do try to control that
      • globalHandleCounterLock

        protected static java.lang.Integer globalHandleCounterLock
        This is the lock object for that global handle counter
      • serverMap

        protected final java.util.Map<java.lang.String,​org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler> serverMap
        This hash maps the server string (without port) to a pool throttling object, where we can track the statistics and make sure we throttle appropriately
      • refCount

        protected int refCount
        Reference count for how many connections to this pool there are
    • Constructor Detail

      • ThrottledFetcher

        public ThrottledFetcher()
        Constructor.
    • Method Detail

      • registerGlobalHandle

        protected static void registerGlobalHandle​(int maxHandles)
                                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Note that we're about to need a handle (and make sure we have enough)
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • releaseGlobalHandle

        protected static void releaseGlobalHandle()
        Note that we're done with a handle (so we can free it)
      • createConnection

        public IThrottledConnection createConnection​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                                     java.lang.String throttleGroupName,
                                                     java.lang.String serverName,
                                                     int connectionLimit,
                                                     int connectionTimeoutMilliseconds,
                                                     java.lang.String proxyHost,
                                                     int proxyPort,
                                                     java.lang.String proxyAuthDomain,
                                                     java.lang.String proxyAuthUsername,
                                                     java.lang.String proxyAuthPassword,
                                                     org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)
                                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                     org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Establish a connection to a specified URL.
        Parameters:
        serverName - is the FQDN of the server, e.g. foo.metacarta.com
        connectionLimit - is the maximum desired outstanding connections at any one time.
        connectionTimeoutMilliseconds - is the number of milliseconds to wait for the connection before timing out.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • poll

        public void poll()
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Poll. This method is designed to allow idle connections to be closed and freed.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • noteConnectionEstablished

        public void noteConnectionEstablished()
        Note that there is a repository connection that is using this object.
      • noteConnectionReleased

        public void noteConnectionReleased()
        Connection pool no longer needed. Call this to indicate that this object no longer needs to keep its pools available, for the moment.