API¶
Core Class¶
-
class
cabu.core.
Cabu
(import_name='cabu', db=None, *args, **kwargs)[source]¶ -
load_config
(settings='cabu.default_settings')[source]¶ Get the given settings module to create a config dict in the class.
All variables defined in upper case in the settings module are imported in the directory and overrided by the corresponding environment variables.
Parameters: settings (Optional[str]) – A module path where stands the settings.
-
Drivers Class¶
-
cabu.drivers.
load_chrome
(config)[source]¶ Start Chrome webdriver with the given configuration.
Parameters: config (dict) – The configuration loaded previously in Cabu. Returns: webdriver – An instance of Chrome webdriver. Return type: selenium.webdriver
-
cabu.drivers.
load_driver
(config, vdisplay=None)[source]¶ Initialize a weddriver selected in config with given config.
Parameters: config (dict) – The configuration loaded previously in Cabu. Returns: webdriver – An instance of selenium webdriver or None. Return type: selenium.webdriver
-
cabu.drivers.
load_firefox
(config)[source]¶ Start Firefox webdriver with the given configuration.
Parameters: config (dict) – The configuration loaded previously in Cabu. Returns: webdriver – An instance of Firefox webdriver. Return type: selenium.webdriver
-
cabu.drivers.
load_phantomjs
(config)[source]¶ Start PhantomJS webdriver with the given configuration.
Parameters: config (dict) – The configuration loaded previously in Cabu. Returns: webdriver – An instance of phantomJS webdriver. Return type: selenium.webdriver
-
cabu.drivers.
load_vdisplay
(config)[source]¶ Initialize a vdisplay (Xvfb subprocess instance).
Parameters: config (dict) – The configuration loaded previously in Cabu. Returns: An instance of Xvfb wrapper. Return type: vdisplay
Auth Module¶
-
cabu.auth.
authenticate
()[source]¶ Response helper for un-authorized attempts to access to the app.
Returns: response – A Flask Response object with a custom message and a 401 status. Return type: object
-
cabu.auth.
check_auth
(username, password)[source]¶ Determines if the given params are similar to the ones stored on config.
This small function compares the given username and password to the ones and returning a boolean accordingly.
Parameters: Returns: auth – True if authorized, False if not.
Return type:
Bucket Class¶
-
class
cabu.utils.bucket.
Bucket
(bucket_name, access_key, secret_key)[source]¶ Convenient class to export datas to an Amazon S3 bucket.
Parameters: -
delete
(filename)[source]¶ Delete the file on the distant S3 bucket with the given filename.
Parameters: filename (str) – A string representing the name of the file to delete. Returns: response – The object returned by requests. Return type: object
-
Cookies Class¶
Interface between Cookies and Database.
Parameters: db (Database) – The Database class instance to wrap. Delete all the cookies stored in the database.
Returns: raw_result – The result of the cleaning. Return type: str
Delete the value of the given cookie key.
Parameters: key (str) – The name of the cookie key to delete. Returns: raw_result – The result of the attempt to delete the cookie. Return type: str
Link Extractor Module¶
-
cabu.utils.link_extractor.
extract_links
(response_content, unique=False, blacklist_domains=[], whitelist_domains=[], regex=None, zen_path=None, blacklist_extensions=[], whitelist_extensions=[])[source]¶ Extract links from a response content.
Parameters: - response_content (str) – The HTML page received in a Response Object.
- unique (bool) – A parameter defining if the list can contain duplicates. Defaults to False.
- blacklist_domains (list) – List of domains to exclude from the result.
- whitelist_domains (list) – List of domains to include from the result.
- regex (list) – A regular expression filter on the link. Defaults to None.
- zen_path (list) – A selector to restrict the XPath to parse with bs4.
Returns: links – A list of extracted and filtered links.
Return type: