Compare commits

...

44 Commits

Author SHA1 Message Date
ValueRaider
95c0760b79 Enhance '_parse_user_dt()' + test 2023-01-20 21:42:31 +00:00
ValueRaider
5d9a91da4a Improve 'get_shares_full()' error handling ; Minor fixes 2023-01-14 22:44:54 +00:00
ValueRaider
47c579ff22 Merge pull request #1297 from alexa-infra/fix-stores-decryption
Fix stores decrypt
2023-01-14 20:06:52 +00:00
ValueRaider
caf5cba801 Merge pull request #1301 from ranaroussi/feature/share-count
Feature/share count
2023-01-14 19:53:45 +00:00
ValueRaider
486c7894ce get_shares_full(): convert to pd.Series, add test 2023-01-14 17:32:54 +00:00
ValueRaider
db8a00edae get_shares_full(): remove caching, tidy API 2023-01-14 17:11:57 +00:00
ValueRaider
805523b924 Fix 'get_shares_full()' post-rebase 2023-01-14 16:58:58 +00:00
ValueRaider
32ab2e648d get_shares_full() set default range 1yr 2023-01-14 16:35:54 +00:00
ValueRaider
4d91ae740a Add date args to 'shares_full()' and caching 2023-01-14 16:35:54 +00:00
ValueRaider
05ec4b4312 Add full share count history via 'shares_full' 2023-01-14 16:35:51 +00:00
ValueRaider
cd2c1ada14 Improve decrypt key deduction 2023-01-14 15:41:33 +00:00
ValueRaider
4ca9642403 Ensure 'requests_cache' responses processed ; Improve naming 2023-01-14 14:20:40 +00:00
Alexey Vasilyev
b438f29a71 Fix decryption 2023-01-14 08:06:35 +01:00
ValueRaider
4db178b8d6 Merge pull request #1284 from ranaroussi/fix/financials-caching
Improve caching of financials data
2023-01-12 11:47:04 +00:00
ValueRaider
38637a9821 Merge pull request #1283 from DE0CH/ignore-tz-false
Change default value to ignore_tz to False
2023-01-08 12:45:00 +00:00
Deyao Chen
de8c0bdcdd Change default value to ignore_tz to False
Bring the behavior of download() to be the same as 0.1.77.
2023-01-08 11:47:13 +08:00
ValueRaider
fd35975cf9 Improve caching of financials data 2023-01-07 18:02:16 +00:00
ValueRaider
1495834a09 Merge pull request #1276 from gogog22510/main
Fix the database lock error in multithread download
2023-01-04 23:10:22 +00:00
ValueRaider
2a7588dead Tidy DB lock fix 2023-01-04 21:32:54 +00:00
gogog22510
051de748b9 Fix the database lock error in multithread download 2023-01-04 12:37:59 -05:00
ValueRaider
97adb30d41 Merge pull request #1262 from ranaroussi/main
Sync `main` -> `dev`
2022-12-20 20:42:10 +00:00
ValueRaider
eacfbc45c0 Bump version to 0.2.3 2022-12-20 11:57:04 +00:00
ValueRaider
8deddd7ee9 Make financials API '_' use consistent 2022-12-20 11:56:57 +00:00
ValueRaider
beb494b67e README: add small section on version 0.2 2022-12-20 11:37:16 +00:00
ValueRaider
e2948a8b48 Bump version to 0.2.2 2022-12-20 11:33:04 +00:00
ValueRaider
ff3d3f2f78 Restore 'financials' attribute (map to 'income_stmt') 2022-12-20 11:32:19 +00:00
ValueRaider
85783da515 README: update 'repair' doc 2022-12-19 23:30:29 +00:00
ValueRaider
9dbfad4294 Bump version to 0.2.1 2022-12-19 23:19:42 +00:00
ValueRaider
5e54b92efd Fix _reconstruct_intervals_batch() calibration bug 2022-12-19 18:09:06 +00:00
ValueRaider
cffdbd47b5 Merge pull request #1253 from Rogach/pr/decode-stores
decode encrypted root.App.main.context.dispatcher.stores
2022-12-19 12:29:57 +00:00
ValueRaider
f398f46509 Switch 'pycryptodome' -> 'cryptography' 2022-12-19 12:28:51 +00:00
ValueRaider
097c76aa46 Add 'pycryptodome' requirement 2022-12-18 13:26:12 +00:00
ValueRaider
a9da16e048 Fix get_json_data_stores() behaviour 2022-12-18 13:19:11 +00:00
Platon Pronko
8e5f0984af decode encrypted root.App.main.context.dispatcher.stores 2022-12-18 11:40:26 +04:00
ValueRaider
38b738e766 Bump version to 0.2.0rc5 2022-12-16 16:27:46 +00:00
ValueRaider
55772d30a4 Merge pull request #1245 from ranaroussi/dev
Merge dev -> main for release 0.2.0rc5
2022-12-16 16:25:36 +00:00
ValueRaider
382285cfd9 Remove hardcoded paths 2022-12-16 16:24:16 +00:00
ValueRaider
d2e5ce284e Merge pull request #1243 from ranaroussi/fix/financials-error-handling
Improve financials error handling
2022-12-16 16:20:25 +00:00
ValueRaider
88d21d742d Merge pull request #1244 from ranaroussi/fix/repair-100x
Fix '100x price' repair
2022-12-16 16:20:17 +00:00
ValueRaider
7a0356d47b Document financials get() methods 2022-12-16 16:19:37 +00:00
ValueRaider
a13bf0cd6c Hide divide-by-0 warnings 2022-12-16 15:05:38 +00:00
ValueRaider
7cacf233ce Improve financials error handling
Nicely intercept parse errors in get_json_data_stores() & _create_financials_table_old() ; Improve exception messages ; Fix typo 'YFiance'
2022-12-16 13:22:17 +00:00
ValueRaider
b48212e420 Repair-100x now tolerates zeroes 2022-12-14 21:16:16 +00:00
ValueRaider
e7bf3607e8 Fix tests 2022-12-13 21:41:46 +00:00
16 changed files with 531 additions and 73 deletions

View File

@@ -1,6 +1,23 @@
Change Log
===========
0.2.3
-----
- Make financials API '_' use consistent
0.2.2
-----
- Restore 'financials' attribute (map to 'income_stmt')
0.2.1
-----
Release!
0.2.0rc5
--------
- Improve financials error handling #1243
- Fix '100x price' repair #1244
0.2.0rc4
--------
- Access to old financials tables via `get_income_stmt(legacy=True)`

View File

@@ -42,6 +42,13 @@ Yahoo! finance API is intended for personal use only.**
---
## What's new in version 0.2
- Optimised web scraping
- All 3 financials tables now match website so expect keys to change. If you really want old tables, use [`Ticker.get_[income_stmt|balance_sheet|cashflow](legacy=True, ...)`](https://github.com/ranaroussi/yfinance/blob/85783da515761a145411d742c2a8a3c1517264b0/yfinance/base.py#L968)
- price data improvements: fix bug NaN rows with dividend; new repair feature for missing or 100x prices `download(repair=True)`; new attribute `Ticker.history_metadata`
[See release notes for full list of changes](https://github.com/ranaroussi/yfinance/releases/tag/0.2.1)
## Quick Start
### The Ticker module
@@ -77,6 +84,7 @@ msft.capital_gains
# show share count
msft.shares
msft.get_shares_full()
# show financials:
# - income statement
@@ -206,8 +214,7 @@ data = yf.download( # or pdr.get_data_yahoo(...
interval = "5d",
# Whether to ignore timezone when aligning ticker data from
# different timezones. Default is True. False may be useful for
# minute/hourly data.
# different timezones. Default is False.
ignore_tz = False,
# group by ticker (to access via data['SPY'])
@@ -218,7 +225,7 @@ data = yf.download( # or pdr.get_data_yahoo(...
# (optional, default is False)
auto_adjust = True,
# identify and attempt repair of currency unit mixups e.g. $/cents
# attempt repair of missing data or currency mixups e.g. $/cents
repair = False,
# download pre/post regular market hours data
@@ -306,6 +313,7 @@ To install `yfinance` using `conda`, see
- [frozendict](https://pypi.org/project/frozendict) \>= 2.3.4
- [beautifulsoup4](https://pypi.org/project/beautifulsoup4) \>= 4.11.1
- [html5lib](https://pypi.org/project/html5lib) \>= 1.1
- [cryptography](https://pypi.org/project/cryptography) \>= 3.3.2
### Optional (if you want to use `pandas_datareader`)

View File

@@ -1,5 +1,5 @@
{% set name = "yfinance" %}
{% set version = "0.2.0" %}
{% set version = "0.2.3" %}
package:
name: "{{ name|lower }}"
@@ -26,6 +26,8 @@ requirements:
- frozendict >=2.3.4
- beautifulsoup4 >=4.11.1
- html5lib >=1.1
# - pycryptodome >=3.6.6
- cryptography >=3.3.2
- pip
- python
@@ -40,6 +42,8 @@ requirements:
- frozendict >=2.3.4
- beautifulsoup4 >=4.11.1
- html5lib >=1.1
# - pycryptodome >=3.6.6
- cryptography >=3.3.2
- python
test:

View File

@@ -8,3 +8,4 @@ pytz>=2022.5
frozendict>=2.3.4
beautifulsoup4>=4.11.1
html5lib>=1.1
cryptography>=3.3.2

View File

@@ -62,7 +62,9 @@ setup(
install_requires=['pandas>=1.3.0', 'numpy>=1.16.5',
'requests>=2.26', 'multitasking>=0.0.7',
'lxml>=4.9.1', 'appdirs>=1.4.4', 'pytz>=2022.5',
'frozendict>=2.3.4',
'frozendict>=2.3.4',
# 'pycryptodome>=3.6.6',
'cryptography>=3.3.2',
'beautifulsoup4>=4.11.1', 'html5lib>=1.1'],
entry_points={
'console_scripts': [

View File

@@ -65,6 +65,7 @@ class TestTicker(unittest.TestCase):
dat.splits
dat.actions
dat.shares
dat.get_shares_full()
dat.info
dat.calendar
dat.recommendations
@@ -100,6 +101,7 @@ class TestTicker(unittest.TestCase):
dat.splits
dat.actions
dat.shares
dat.get_shares_full()
dat.info
dat.calendar
dat.recommendations
@@ -128,9 +130,20 @@ class TestTicker(unittest.TestCase):
class TestTickerHistory(unittest.TestCase):
session = None
@classmethod
def setUpClass(cls):
cls.session = requests_cache.CachedSession(backend='memory')
@classmethod
def tearDownClass(cls):
if cls.session is not None:
cls.session.close()
def setUp(self):
# use a ticker that has dividends
self.ticker = yf.Ticker("IBM")
self.ticker = yf.Ticker("IBM", session=self.session)
def tearDown(self):
self.ticker = None
@@ -176,9 +189,19 @@ class TestTickerHistory(unittest.TestCase):
class TestTickerEarnings(unittest.TestCase):
session = None
@classmethod
def setUpClass(cls):
cls.session = requests_cache.CachedSession(backend='memory')
@classmethod
def tearDownClass(cls):
if cls.session is not None:
cls.session.close()
def setUp(self):
self.ticker = yf.Ticker("GOOGL")
self.ticker = yf.Ticker("GOOGL", session=self.session)
def tearDown(self):
self.ticker = None
@@ -237,9 +260,19 @@ class TestTickerEarnings(unittest.TestCase):
class TestTickerHolders(unittest.TestCase):
session = None
@classmethod
def setUpClass(cls):
cls.session = requests_cache.CachedSession(backend='memory')
@classmethod
def tearDownClass(cls):
if cls.session is not None:
cls.session.close()
def setUp(self):
self.ticker = yf.Ticker("GOOGL")
self.ticker = yf.Ticker("GOOGL", session=self.session)
def tearDown(self):
self.ticker = None
@@ -283,7 +316,7 @@ class TestTickerMiscFinancials(unittest.TestCase):
def setUp(self):
self.ticker = yf.Ticker("GOOGL", session=self.session)
# For ticker 'BSE.AX' (and others), Yahoo not returning
# full quarterly financials (usually cash-flow) with all entries,
# instead returns a smaller version in different data store.
@@ -497,6 +530,65 @@ class TestTickerMiscFinancials(unittest.TestCase):
data_cached = self.ticker_old_fmt.get_cashflow(legacy=True, freq="quarterly")
self.assertIs(data, data_cached, "data not cached")
def test_income_alt_names(self):
i1 = self.ticker.income_stmt
i2 = self.ticker.incomestmt
self.assertTrue(i1.equals(i2))
i3 = self.ticker.financials
self.assertTrue(i1.equals(i3))
i1 = self.ticker.get_income_stmt()
i2 = self.ticker.get_incomestmt()
self.assertTrue(i1.equals(i2))
i3 = self.ticker.get_financials()
self.assertTrue(i1.equals(i3))
i1 = self.ticker.quarterly_income_stmt
i2 = self.ticker.quarterly_incomestmt
self.assertTrue(i1.equals(i2))
i3 = self.ticker.quarterly_financials
self.assertTrue(i1.equals(i3))
i1 = self.ticker.get_income_stmt(freq="quarterly")
i2 = self.ticker.get_incomestmt(freq="quarterly")
self.assertTrue(i1.equals(i2))
i3 = self.ticker.get_financials(freq="quarterly")
self.assertTrue(i1.equals(i3))
def test_balance_sheet_alt_names(self):
i1 = self.ticker.balance_sheet
i2 = self.ticker.balancesheet
self.assertTrue(i1.equals(i2))
i1 = self.ticker.get_balance_sheet()
i2 = self.ticker.get_balancesheet()
self.assertTrue(i1.equals(i2))
i1 = self.ticker.quarterly_balance_sheet
i2 = self.ticker.quarterly_balancesheet
self.assertTrue(i1.equals(i2))
i1 = self.ticker.get_balance_sheet(freq="quarterly")
i2 = self.ticker.get_balancesheet(freq="quarterly")
self.assertTrue(i1.equals(i2))
def test_cash_flow_alt_names(self):
i1 = self.ticker.cash_flow
i2 = self.ticker.cashflow
self.assertTrue(i1.equals(i2))
i1 = self.ticker.get_cash_flow()
i2 = self.ticker.get_cashflow()
self.assertTrue(i1.equals(i2))
i1 = self.ticker.quarterly_cash_flow
i2 = self.ticker.quarterly_cashflow
self.assertTrue(i1.equals(i2))
i1 = self.ticker.get_cash_flow(freq="quarterly")
i2 = self.ticker.get_cashflow(freq="quarterly")
self.assertTrue(i1.equals(i2))
def test_sustainability(self):
data = self.ticker.sustainability
self.assertIsInstance(data, pd.DataFrame, "data has wrong type")
@@ -563,6 +655,11 @@ class TestTickerMiscFinancials(unittest.TestCase):
self.assertIsInstance(data, pd.DataFrame, "data has wrong type")
self.assertFalse(data.empty, "data is empty")
def test_shares_full(self):
data = self.ticker.get_shares_full()
self.assertIsInstance(data, pd.Series, "data has wrong type")
self.assertFalse(data.empty, "data is empty")
def test_info(self):
data = self.ticker.info
self.assertIsInstance(data, dict, "data has wrong type")

30
tests/utils.py Normal file
View File

@@ -0,0 +1,30 @@
import unittest
from .context import yfinance as yf
import datetime as _dt
import pandas as _pd
import pytz as _pytz
class TestUtils(unittest.TestCase):
def test_parse_user_dt(self):
""" Purpose of _parse_user_dt() is to take any date-like value,
combine with specified timezone and
return its localized timestamp.
"""
tz_name = "America/New_York"
tz = _pytz.timezone(tz_name)
dt_answer = tz.localize(_dt.datetime(2023,1,1))
# All possible versions of 'dt_answer'
values = ["2023-01-01", _dt.date(2023,1,1), _dt.datetime(2023,1,1), _pd.Timestamp(_dt.date(2023,1,1))]
# - now add localized versions
values.append(tz.localize(_dt.datetime(2023,1,1)))
values.append(_pd.Timestamp(_dt.date(2023,1,1)).tz_localize(tz_name))
values.append(int(_pd.Timestamp(_dt.date(2023,1,1)).tz_localize(tz_name).timestamp()))
for v in values:
v2 = yf.utils._parse_user_dt(v, tz_name)
self.assertEqual(v2, dt_answer.timestamp())

View File

@@ -40,6 +40,7 @@ from .scrapers.analysis import Analysis
from .scrapers.fundamentals import Fundamentals
from .scrapers.holders import Holders
from .scrapers.quote import Quote
import json as _json
_BASE_URL_ = 'https://query2.finance.yahoo.com'
_SCRAPE_URL_ = 'https://finance.yahoo.com/quote'
@@ -559,12 +560,26 @@ class TickerBase:
# Calibrate! Check whether 'df_fine' has different split-adjustment.
# If different, then adjust to match 'df'
df_block_calib = df_block[price_cols]
calib_filter = df_block_calib.to_numpy() != tag
common_index = df_block_calib.index[df_block_calib.index.isin(df_new.index)]
if len(common_index) == 0:
# Can't calibrate so don't attempt repair
continue
df_new_calib = df_new[df_new.index.isin(common_index)][price_cols]
df_block_calib = df_block_calib[df_block_calib.index.isin(common_index)]
calib_filter = (df_block_calib != tag).to_numpy()
if not calib_filter.any():
# Can't calibrate so don't attempt repair
continue
df_new_calib = df_new[df_new.index.isin(df_block_calib.index)][price_cols]
ratios = (df_block_calib[price_cols].to_numpy() / df_new_calib[price_cols].to_numpy())[calib_filter]
# Avoid divide-by-zero warnings printing:
df_new_calib = df_new_calib.to_numpy()
df_block_calib = df_block_calib.to_numpy()
for j in range(len(price_cols)):
c = price_cols[j]
f = ~calib_filter[:,j]
if f.any():
df_block_calib[f,j] = 1
df_new_calib[f,j] = 1
ratios = (df_block_calib / df_new_calib)[calib_filter]
ratio = _np.mean(ratios)
#
ratio_rcp = round(1.0 / ratio, 1)
@@ -591,7 +606,7 @@ class TickerBase:
if not idx in df_new.index:
# Yahoo didn't return finer-grain data for this interval,
# so probably no trading happened.
print("no fine data")
# print("no fine data")
continue
df_new_row = df_new.loc[idx]
@@ -646,10 +661,15 @@ class TickerBase:
data_cols = ["High", "Open", "Low", "Close"] # Order important, separate High from Low
data_cols = [c for c in data_cols if c in df2.columns]
f_zeroes = (df2[data_cols]==0).any(axis=1)
if f_zeroes.any():
df2_zeroes = df2[f_zeroes]
df2 = df2[~f_zeroes]
else:
df2_zeroes = None
if df2.shape[0] <= 1:
return df
median = _ndimage.median_filter(df2[data_cols].values, size=(3, 3), mode="wrap")
if (median == 0).any():
raise Exception("median contains zeroes, why?")
ratio = df2[data_cols].values / median
ratio_rounded = (ratio / 20).round() * 20 # round ratio to nearest 20
f = ratio_rounded == 100
@@ -715,6 +735,9 @@ class TickerBase:
if fj.any():
c = data_cols[j]
df2.loc[fj, c] = df.loc[fj, c]
if df2_zeroes is not None:
df2 = _pd.concat([df2, df2_zeroes]).sort_index()
df2.index = _pd.to_datetime()
return df2
@@ -922,6 +945,18 @@ class TickerBase:
return data
def get_earnings(self, proxy=None, as_dict=False, freq="yearly"):
"""
:Parameters:
as_dict: bool
Return table as Python dict
Default is False
freq: str
"yearly" or "quarterly"
Default is "yearly"
proxy: str
Optional. Proxy server URL scheme
Default is None
"""
self._fundamentals.proxy = proxy
data = self._fundamentals.earnings[freq]
if as_dict:
@@ -932,6 +967,24 @@ class TickerBase:
return data
def get_income_stmt(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
"""
:Parameters:
as_dict: bool
Return table as Python dict
Default is False
pretty: bool
Format row names nicely for readability
Default is False
freq: str
"yearly" or "quarterly"
Default is "yearly"
legacy: bool
Return old financials tables. Useful for when new tables not available
Default is False
proxy: str
Optional. Proxy server URL scheme
Default is None
"""
self._fundamentals.proxy = proxy
if legacy:
@@ -946,7 +999,31 @@ class TickerBase:
return data.to_dict()
return data
def get_incomestmt(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
return self.get_income_stmt(proxy, as_dict, pretty, freq, legacy)
def get_financials(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
return self.get_income_stmt(proxy, as_dict, pretty, freq, legacy)
def get_balance_sheet(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
"""
:Parameters:
as_dict: bool
Return table as Python dict
Default is False
pretty: bool
Format row names nicely for readability
Default is False
freq: str
"yearly" or "quarterly"
Default is "yearly"
legacy: bool
Return old financials tables. Useful for when new tables not available
Default is False
proxy: str
Optional. Proxy server URL scheme
Default is None
"""
self._fundamentals.proxy = proxy
if legacy:
@@ -961,7 +1038,28 @@ class TickerBase:
return data.to_dict()
return data
def get_cashflow(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
def get_balancesheet(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
return self.get_balance_sheet(proxy, as_dict, pretty, freq, legacy)
def get_cash_flow(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
"""
:Parameters:
as_dict: bool
Return table as Python dict
Default is False
pretty: bool
Format row names nicely for readability
Default is False
freq: str
"yearly" or "quarterly"
Default is "yearly"
legacy: bool
Return old financials tables. Useful for when new tables not available
Default is False
proxy: str
Optional. Proxy server URL scheme
Default is None
"""
self._fundamentals.proxy = proxy
if legacy:
@@ -976,6 +1074,9 @@ class TickerBase:
return data.to_dict()
return data
def get_cashflow(self, proxy=None, as_dict=False, pretty=False, freq="yearly", legacy=False):
return self.get_cash_flow(proxy, as_dict, pretty, freq, legacy)
def get_dividends(self, proxy=None):
if self._history is None:
self.history(period="max", proxy=proxy)
@@ -1018,6 +1119,59 @@ class TickerBase:
return data.to_dict()
return data
def get_shares_full(self, start=None, end=None, proxy=None):
# Process dates
tz = self._get_ticker_tz(debug_mode=False, proxy=None, timeout=10)
dt_now = _pd.Timestamp.utcnow().tz_convert(tz)
if start is not None:
start_ts = utils._parse_user_dt(start, tz)
start = _pd.Timestamp.fromtimestamp(start_ts).tz_localize("UTC").tz_convert(tz)
start_d = start.date()
if end is not None:
end_ts = utils._parse_user_dt(end, tz)
end = _pd.Timestamp.fromtimestamp(end_ts).tz_localize("UTC").tz_convert(tz)
end_d = end.date()
if end is None:
end = dt_now
if start is None:
start = end - _pd.Timedelta(days=548) # 18 months
if start >= end:
print("ERROR: start date must be before end")
return None
start = start.floor("D")
end = end.ceil("D")
# Fetch
ts_url_base = "https://query2.finance.yahoo.com/ws/fundamentals-timeseries/v1/finance/timeseries/{0}?symbol={0}".format(self.ticker)
shares_url = ts_url_base + "&period1={}&period2={}".format(int(start.timestamp()), int(end.timestamp()))
try:
json_str = self._data.cache_get(shares_url).text
json_data = _json.loads(json_str)
except:
print(f"{self.ticker}: Yahoo web request for share count failed")
return None
try:
fail = json_data["finance"]["error"]["code"] == "Bad Request"
except:
fail = False
if fail:
print(f"{self.ticker}: Yahoo web request for share count failed")
return None
shares_data = json_data["timeseries"]["result"]
if not "shares_out" in shares_data[0]:
print(f"{self.ticker}: Yahoo did not return share count in date range {start} -> {end}")
return None
try:
df = _pd.Series(shares_data[0]["shares_out"], index=_pd.to_datetime(shares_data[0]["timestamp"], unit="s"))
except Exception as e:
print(f"{self.ticker}: Failed to parse shares count data: "+str(e))
return None
df.index = df.index.tz_localize(tz)
df = df.sort_index()
return df
def get_isin(self, proxy=None) -> Optional[str]:
# *** experimental ***
if self._isin is not None:
@@ -1154,8 +1308,8 @@ class TickerBase:
dates[cn] = _pd.to_datetime(dates[cn], format="%b %d, %Y, %I %p")
# - instead of attempting decoding of ambiguous timezone abbreviation, just use 'info':
self._quote.proxy = proxy
dates[cn] = dates[cn].dt.tz_localize(
tz=self._quote.info["exchangeTimezoneName"])
tz = self._get_ticker_tz(debug_mode=False, proxy=proxy, timeout=30)
dates[cn] = dates[cn].dt.tz_localize(tz)
dates = dates.set_index("Earnings Date")

View File

@@ -1,6 +1,17 @@
import functools
from functools import lru_cache
import hashlib
from base64 import b64decode
usePycryptodome = False # slightly faster
# usePycryptodome = True
if usePycryptodome:
from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
else:
from cryptography.hazmat.primitives import padding
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
import requests as requests
import re
@@ -35,6 +46,99 @@ def lru_cache_freezeargs(func):
return wrapped
def decrypt_cryptojs_aes_stores(data):
encrypted_stores = data['context']['dispatcher']['stores']
if "_cs" in data and "_cr" in data:
_cs = data["_cs"]
_cr = data["_cr"]
_cr = b"".join(int.to_bytes(i, length=4, byteorder="big", signed=True) for i in json.loads(_cr)["words"])
password = hashlib.pbkdf2_hmac("sha1", _cs.encode("utf8"), _cr, 1, dklen=32).hex()
else:
# Currently assume one extra key in dict, which is password. Print error if
# more extra keys detected.
new_keys = [k for k in data.keys() if k not in ["context", "plugins"]]
l = len(new_keys)
if l == 0:
return None
elif l == 1 and isinstance(data[new_keys[0]], str):
password_key = new_keys[0]
else:
msg = "Yahoo has again changed data format, yfinance now unsure which key(s) is for decryption:"
k = new_keys[0]
k_str = k if len(k) < 32 else k[:32-3]+"..."
msg += f" '{k_str}'->{type(data[k])}"
for i in range(1, len(new_keys)):
msg += f" , '{k_str}'->{type(data[k])}"
raise Exception(msg)
password_key = new_keys[0]
password = data[password_key]
encrypted_stores = b64decode(encrypted_stores)
assert encrypted_stores[0:8] == b"Salted__"
salt = encrypted_stores[8:16]
encrypted_stores = encrypted_stores[16:]
def EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5") -> tuple:
"""OpenSSL EVP Key Derivation Function
Args:
password (Union[str, bytes, bytearray]): Password to generate key from.
salt (Union[bytes, bytearray]): Salt to use.
keySize (int, optional): Output key length in bytes. Defaults to 32.
ivSize (int, optional): Output Initialization Vector (IV) length in bytes. Defaults to 16.
iterations (int, optional): Number of iterations to perform. Defaults to 1.
hashAlgorithm (str, optional): Hash algorithm to use for the KDF. Defaults to 'md5'.
Returns:
key, iv: Derived key and Initialization Vector (IV) bytes.
Taken from: https://gist.github.com/rafiibrahim8/0cd0f8c46896cafef6486cb1a50a16d3
OpenSSL original code: https://github.com/openssl/openssl/blob/master/crypto/evp/evp_key.c#L78
"""
assert iterations > 0, "Iterations can not be less than 1."
if isinstance(password, str):
password = password.encode("utf-8")
final_length = keySize + ivSize
key_iv = b""
block = None
while len(key_iv) < final_length:
hasher = hashlib.new(hashAlgorithm)
if block:
hasher.update(block)
hasher.update(password)
hasher.update(salt)
block = hasher.digest()
for _ in range(1, iterations):
block = hashlib.new(hashAlgorithm, block).digest()
key_iv += block
key, iv = key_iv[:keySize], key_iv[keySize:final_length]
return key, iv
try:
key, iv = EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5")
except:
raise Exception("yfinance failed to decrypt Yahoo data response")
if usePycryptodome:
cipher = AES.new(key, AES.MODE_CBC, iv=iv)
plaintext = cipher.decrypt(encrypted_stores)
plaintext = unpad(plaintext, 16, style="pkcs7")
else:
cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
decryptor = cipher.decryptor()
plaintext = decryptor.update(encrypted_stores) + decryptor.finalize()
unpadder = padding.PKCS7(128).unpadder()
plaintext = unpadder.update(plaintext) + unpadder.finalize()
plaintext = plaintext.decode("utf-8")
decoded_stores = json.loads(plaintext)
return decoded_stores
_SCRAPE_URL_ = 'https://finance.yahoo.com/quote'
@@ -86,12 +190,25 @@ class TickerData:
html = self.get(url=ticker_url, proxy=proxy).text
# The actual json-data for stores is in a javascript assignment in the webpage
json_str = html.split('root.App.main =')[1].split(
'(this)')[0].split(';\n}')[0].strip()
data = json.loads(json_str)['context']['dispatcher']['stores']
try:
json_str = html.split('root.App.main =')[1].split(
'(this)')[0].split(';\n}')[0].strip()
except IndexError:
# Fetch failed, probably because Yahoo spam triggered
return {}
data = json.loads(json_str)
stores = decrypt_cryptojs_aes_stores(data)
if stores is None:
# Maybe Yahoo returned old format, not encrypted
if "context" in data and "dispatcher" in data["context"]:
stores = data['context']['dispatcher']['stores']
if stores is None:
raise Exception(f"{self.ticker}: Failed to extract data stores from web request")
# return data
new_data = json.dumps(data).replace('{}', 'null')
new_data = json.dumps(stores).replace('{}', 'null')
new_data = re.sub(
r'{[\'|\"]raw[\'|\"]:(.*?),(.*?)}', r'\1', new_data)

View File

@@ -1,6 +1,6 @@
class YFianceException(Exception):
class YFinanceException(Exception):
pass
class YFianceDataException(YFianceException):
class YFinanceDataException(YFinanceException):
pass

View File

@@ -29,7 +29,7 @@ from . import Ticker, utils
from . import shared
def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=True,
def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=False,
group_by='column', auto_adjust=False, back_adjust=False, repair=False, keepna=False,
progress=True, period="max", show_errors=True, interval="1d", prepost=False,
proxy=None, rounding=False, timeout=10):
@@ -68,7 +68,7 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
How many threads to use for mass downloading. Default is True
ignore_tz: bool
When combining from different timezones, ignore that part of datetime.
Default is True
Default is False
proxy: str
Optional. Proxy server URL scheme. Default is None
rounding: bool

View File

@@ -6,7 +6,7 @@ import numpy as np
from yfinance import utils
from yfinance.data import TickerData
from yfinance.exceptions import YFianceDataException, YFianceException
from yfinance.exceptions import YFinanceDataException, YFinanceException
class Fundamentals:
@@ -22,10 +22,10 @@ class Fundamentals:
self._financials_data = None
self._fin_data_quote = None
self._basics_already_scraped = False
self._financials = Fiancials(data)
self._financials = Financials(data)
@property
def financials(self) -> "Fiancials":
def financials(self) -> "Financials":
return self._financials
@property
@@ -97,7 +97,7 @@ class Fundamentals:
pass
class Fiancials:
class Financials:
def __init__(self, data: TickerData):
self._data = data
self._income_time_series = {}
@@ -143,8 +143,8 @@ class Fiancials:
if statement is not None:
return statement
except YFianceException as e:
print("Failed to create financials table for {} reason: {}".format(name, repr(e)))
except YFinanceException as e:
print(f"- {self._data.ticker}: Failed to create {name} financials table for reason: {repr(e)}")
return pd.DataFrame()
def _create_financials_table(self, name, timescale, proxy):
@@ -153,14 +153,8 @@ class Fiancials:
name = "financials"
keys = self._get_datastore_keys(name, proxy)
try:
# Developers note: TTM and template stuff allows for reproducing the nested structure
# visible on Yahoo website. But more work needed to make it user-friendly! Ideally
# return a tree data structure instead of Pandas MultiIndex
# So until this is implemented, just return simple tables
return self.get_financials_time_series(timescale, keys, proxy)
except Exception as e:
pass
@@ -183,10 +177,10 @@ class Fiancials:
try:
keys = _finditem1("key", data_stores['FinancialTemplateStore'])
except KeyError as e:
raise YFianceDataException("Parsing FinancialTemplateStore failed, reason: {}".format(repr(e)))
raise YFinanceDataException("Parsing FinancialTemplateStore failed, reason: {}".format(repr(e)))
if not keys:
raise YFianceDataException("No keys in FinancialTemplateStore")
raise YFinanceDataException("No keys in FinancialTemplateStore")
return keys
def get_financials_time_series(self, timescale, keys: list, proxy=None) -> pd.DataFrame:
@@ -201,7 +195,7 @@ class Fiancials:
url = ts_url_base + "&type=" + ",".join([timescale + k for k in keys])
# Yahoo returns maximum 4 years or 5 quarters, regardless of start_dt:
start_dt = datetime.datetime(2016, 12, 31)
end = (datetime.datetime.now() + datetime.timedelta(days=366))
end = pd.Timestamp.utcnow().ceil("D")
url += "&period1={}&period2={}".format(int(start_dt.timestamp()), int(end.timestamp()))
# Step 3: fetch and reshape data
@@ -272,8 +266,8 @@ class Fiancials:
if statement is not None:
return statement
except YFianceException as e:
print("Failed to create financials table for {} reason: {}".format(name, repr(e)))
except YFinanceException as e:
print(f"- {self._data.ticker}: Failed to create financials table for {name} reason: {repr(e)}")
return pd.DataFrame()
def _create_financials_table_old(self, name, timescale, proxy):
@@ -281,7 +275,7 @@ class Fiancials:
# Fetch raw data
if not "QuoteSummaryStore" in data_stores:
return pd.DataFrame()
raise YFinanceDataException(f"Yahoo not returning legacy financials data")
data = data_stores["QuoteSummaryStore"]
if name == "cash-flow":
@@ -296,12 +290,14 @@ class Fiancials:
key1 += "History"
if timescale == "quarterly":
key1 += "Quarterly"
data = data.get(key1)[key2]
if key1 not in data or data[key1] is None or key2 not in data[key1]:
raise YFinanceDataException(f"Yahoo not returning legacy {name} financials data")
data = data[key1][key2]
# Tabulate
df = pd.DataFrame(data)
if len(df) == 0:
return pd.DataFrame()
raise YFinanceDataException(f"Yahoo not returning legacy {name} financials data")
df = df.drop(columns=['maxAge'])
for col in df.columns:
df[col] = df[col].replace('-', np.nan)

View File

@@ -194,9 +194,11 @@ class Quote:
for k in keys:
url += "&type=" + k
# Request 6 months of data
url += "&period1={}".format(
int((datetime.datetime.now() - datetime.timedelta(days=365 // 2)).timestamp()))
url += "&period2={}".format(int((datetime.datetime.now() + datetime.timedelta(days=1)).timestamp()))
start = pd.Timestamp.utcnow().floor("D") - datetime.timedelta(days=365 // 2)
start = int(start.timestamp())
end = pd.Timestamp.utcnow().ceil("D")
end = int(end.timestamp())
url += f"&period1={start}&period2={end}"
json_str = self._data.cache_get(url=url, proxy=proxy).text
json_data = json.loads(json_str)

View File

@@ -161,6 +161,22 @@ class Ticker(TickerBase):
def quarterly_income_stmt(self) -> _pd.DataFrame:
return self.get_income_stmt(pretty=True, freq='quarterly')
@property
def incomestmt(self) -> _pd.DataFrame:
return self.income_stmt
@property
def quarterly_incomestmt(self) -> _pd.DataFrame:
return self.quarterly_income_stmt
@property
def financials(self) -> _pd.DataFrame:
return self.income_stmt
@property
def quarterly_financials(self) -> _pd.DataFrame:
return self.quarterly_income_stmt
@property
def balance_sheet(self) -> _pd.DataFrame:
return self.get_balance_sheet(pretty=True)
@@ -177,13 +193,21 @@ class Ticker(TickerBase):
def quarterly_balancesheet(self) -> _pd.DataFrame:
return self.quarterly_balance_sheet
@property
def cash_flow(self) -> _pd.DataFrame:
return self.get_cash_flow(pretty=True, freq="yearly")
@property
def quarterly_cash_flow(self) -> _pd.DataFrame:
return self.get_cash_flow(pretty=True, freq='quarterly')
@property
def cashflow(self) -> _pd.DataFrame:
return self.get_cashflow(pretty=True, freq="yearly")
return self.cash_flow
@property
def quarterly_cashflow(self) -> _pd.DataFrame:
return self.get_cashflow(pretty=True, freq='quarterly')
return self.quarterly_cash_flow
@property
def recommendations_summary(self):

View File

@@ -288,21 +288,30 @@ def camel2title(strings: List[str], sep: str = ' ', acronyms: Optional[List[str]
return strings
def _parse_user_dt(dt, exchange_tz):
def _parse_user_dt(dt, tz_name):
if isinstance(dt, int):
# Should already be epoch, test with conversion:
_datetime.datetime.fromtimestamp(dt)
else:
# Convert str/date -> datetime, set tzinfo=exchange, get timestamp:
if isinstance(dt, str):
dt = _datetime.datetime.strptime(str(dt), '%Y-%m-%d')
if isinstance(dt, _datetime.date) and not isinstance(dt, _datetime.datetime):
dt = _datetime.datetime.combine(dt, _datetime.time(0))
if isinstance(dt, _datetime.datetime) and dt.tzinfo is None:
# Assume user is referring to exchange's timezone
dt = _tz.timezone(exchange_tz).localize(dt)
dt = int(dt.timestamp())
return dt
try:
_datetime.datetime.fromtimestamp(dt)
except:
raise Exception(f"'dt' is not a valid epoch: '{dt}'")
return dt
# Convert str/date -> datetime, set tzinfo=exchange, get timestamp:
dt2 = dt
if isinstance(dt2, str):
dt2 = _datetime.datetime.strptime(str(dt2), '%Y-%m-%d')
if isinstance(dt2, _datetime.date) and not isinstance(dt2, _datetime.datetime):
dt2 = _datetime.datetime.combine(dt2, _datetime.time(0))
if isinstance(dt2, _datetime.datetime) and dt2.tzinfo is None:
# Assume user is referring to exchange's timezone
if tz_name is None:
raise Exception(f"Must provide a timezone for localizing '{dt}'")
dt2 = _tz.timezone(tz_name).localize(dt2)
if not isinstance(dt2, _datetime.datetime):
raise Exception(f"'dt' is not a date-like value: '{dt}'")
dt2 = int(dt2.timestamp())
return dt2
def _interval_to_timedelta(interval):
@@ -607,7 +616,7 @@ def safe_merge_dfs(df_main, df_sub, interval):
if interval.endswith('m') or interval.endswith('h') or interval == "1d":
# Update: is possible with daily data when dividend very recent
f_missing = ~df_sub.index.isin(df.index)
df_sub_missing = df_sub[f_missing]
df_sub_missing = df_sub[f_missing].copy()
keys = {"Adj Open", "Open", "Adj High", "High", "Adj Low", "Low", "Adj Close",
"Close"}.intersection(df.columns)
df_sub_missing[list(keys)] = _np.nan
@@ -743,8 +752,10 @@ class _TzCache:
"""Simple sqlite file cache of ticker->timezone"""
def __init__(self):
self._tz_db = None
self._setup_cache_folder()
# Must init db here, where is thread-safe
self._tz_db = _KVStore(_os.path.join(self._db_dir, "tkr-tz.db"))
self._migrate_cache_tkr_tz()
def _setup_cache_folder(self):
if not _os.path.isdir(self._db_dir):
@@ -776,11 +787,6 @@ class _TzCache:
@property
def tz_db(self):
# lazy init
if self._tz_db is None:
self._tz_db = _KVStore(_os.path.join(self._db_dir, "tkr-tz.db"))
self._migrate_cache_tkr_tz()
return self._tz_db
def _migrate_cache_tkr_tz(self):

View File

@@ -1 +1 @@
version = "0.2.0rc4"
version = "0.2.3"