Compare commits

..

1 Commits

Author SHA1 Message Date
ValueRaider
95c0760b79 Enhance '_parse_user_dt()' + test 2023-01-20 21:42:31 +00:00
15 changed files with 423 additions and 1690 deletions

View File

@@ -7,9 +7,7 @@ assignees: ''
---
# IMPORTANT
If you want help, you got to read this first, follow the instructions.
# READ BEFORE POSTING
### Are you up-to-date?
@@ -25,19 +23,20 @@ and comparing against [PIP](https://pypi.org/project/yfinance/#history).
### Does Yahoo actually have the data?
Are you spelling ticker *exactly* same as Yahoo?
Visit `finance.yahoo.com` and confim they have your data. Maybe your ticker was delisted.
Then visit `finance.yahoo.com` and confirm they have the data you want. Maybe your ticker was delisted, or your expectations of `yfinance` are wrong.
Then check that you are spelling ticker *exactly* same as Yahoo.
### Are you spamming Yahoo?
Yahoo Finance free service has rate-limiting depending on request type - roughly 60/minute for prices, 10/minute for info. Once limit hit, Yahoo can delay, block, or return bad data. Not a `yfinance` bug.
Yahoo Finance free service has limit on query rate (roughly 100/s). Them delaying or blocking your spam is not a bug.
### Still think it's a bug?
Delete this default message (all of it) and submit your bug report here, providing the following as best you can:
Delete this default message and submit your bug report here, providing the following as best you can:
- Simple code that reproduces your problem, that we can copy-paste-run
- Exception message with full traceback, or proof `yfinance` returning bad data
- `yfinance` version and Python version
- Operating system type
- Info about your system:
- yfinance version
- operating system
- Simple code that reproduces your problem
- The error message

View File

@@ -1,69 +1,6 @@
Change Log
===========
0.2.15
------
Restore missing Ticker.info keys #1480
0.2.14
------
Fix Ticker.info dict by fetching from API #1461
0.2.13
------
Price bug fixes:
- fetch big-interval with Capital Gains #1455
- merging dividends & splits with prices #1452
0.2.12
------
Disable annoying 'backup decrypt' msg
0.2.11
------
Fix history_metadata accesses for unusual symbols #1411
0.2.10
------
General
- allow using sqlite3 < 3.8.2 #1380
- add another backup decrypt option #1379
Prices
- restore original download() timezone handling #1385
- fix & improve price repair #1289 2a2928b 86d6acc
- drop intraday intervals if in post-market but prepost=False #1311
Info
- fast_info improvements:
- add camelCase keys, add dict functions values() & items() #1368
- fix fast_info["previousClose"] #1383
- catch TypeError Exception #1397
0.2.9
-----
- Fix fast_info bugs #1362
0.2.7
-----
- Fix Yahoo decryption, smarter this time #1353
- Rename basic_info -> fast_info #1354
0.2.6
-----
- Fix Ticker.basic_info lazy-loading #1342
0.2.5
-----
- Fix Yahoo data decryption again #1336
- New: Ticker.basic_info - faster Ticker.info #1317
0.2.4
-----
- Fix Yahoo data decryption #1297
- New feature: 'Ticker.get_shares_full()' #1301
- Improve caching of financials data #1284
- Restore download() original alignment behaviour #1283
- Fix the database lock error in multithread download #1276
0.2.3
-----
- Make financials API '_' use consistent

153
README.md
View File

@@ -42,10 +42,12 @@ Yahoo! finance API is intended for personal use only.**
---
## News [2023-01-27]
Since December 2022 Yahoo has been encrypting the web data that `yfinance` scrapes for non-market data. Fortunately the decryption keys are available, although Yahoo moved/changed them several times hence `yfinance` breaking several times. `yfinance` is now better prepared for any future changes by Yahoo.
## What's new in version 0.2
Why is Yahoo doing this? We don't know. Is it to stop scrapers? Maybe, so we've implemented changes to reduce load on Yahoo. In December we rolled out version 0.2 with optimised scraping. ~Then in 0.2.6 introduced `Ticker.fast_info`, providing much faster access to some `info` elements wherever possible e.g. price stats and forcing users to switch (sorry but we think necessary). `info` will continue to exist for as long as there are elements without a fast alternative.~ `info` now fixed and much faster than before.
- Optimised web scraping
- All 3 financials tables now match website so expect keys to change. If you really want old tables, use [`Ticker.get_[income_stmt|balance_sheet|cashflow](legacy=True, ...)`](https://github.com/ranaroussi/yfinance/blob/85783da515761a145411d742c2a8a3c1517264b0/yfinance/base.py#L968)
- price data improvements: fix bug NaN rows with dividend; new repair feature for missing or 100x prices `download(repair=True)`; new attribute `Ticker.history_metadata`
[See release notes for full list of changes](https://github.com/ranaroussi/yfinance/releases/tag/0.2.1)
## Quick Start
@@ -58,26 +60,31 @@ import yfinance as yf
msft = yf.Ticker("MSFT")
# get all stock info
# get stock info
msft.info
# get historical market data
hist = msft.history(period="1mo")
hist = msft.history(period="max")
# show meta information about the history (requires history() to be called first)
msft.history_metadata
# show actions (dividends, splits, capital gains)
msft.actions
# show dividends
msft.dividends
# show splits
msft.splits
msft.capital_gains # only for mutual funds & etfs
# show capital gains (for mutual funds & etfs)
msft.capital_gains
# show share count
# - yearly summary:
msft.shares
# - accurate time-series count:
msft.get_shares_full(start="2022-01-01", end=None)
msft.get_shares_full()
# show financials:
# - income statement
@@ -91,9 +98,13 @@ msft.cashflow
msft.quarterly_cashflow
# see `Ticker.get_income_stmt()` for more options
# show holders
# show major holders
msft.major_holders
# show institutional holders
msft.institutional_holders
# show mutualfund holders
msft.mutualfund_holders
# show earnings
@@ -152,7 +163,18 @@ msft.option_chain(..., proxy="PROXY_SERVER")
...
```
### Multiple tickers
To use a custom `requests` session (for example to cache calls to the
API or customize the `User-agent` header), pass a `session=` argument to
the Ticker constructor.
```python
import requests_cache
session = requests_cache.CachedSession('yfinance.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yf.Ticker('msft', session=session)
# The scraped response will be stored in the cache
ticker.actions
```
To initialize multiple `Ticker` objects, use
@@ -167,54 +189,69 @@ tickers.tickers['AAPL'].history(period="1mo")
tickers.tickers['GOOG'].actions
```
To download price history into one table:
### Fetching data for multiple tickers
```python
import yfinance as yf
data = yf.download("SPY AAPL", start="2017-01-01", end="2017-04-30")
```
`yf.download()` and `Ticker.history()` have many options for configuring fetching and processing, e.g.:
I've also added some options to make life easier :)
```python
yf.download(tickers = "SPY AAPL", # list of tickers
period = "1y", # time period
interval = "1d", # trading interval
prepost = False, # download pre/post market hours data?
repair = True) # repair obvious price errors e.g. 100x?
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers = "SPY AAPL MSFT",
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "ytd",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "5d",
# Whether to ignore timezone when aligning ticker data from
# different timezones. Default is False.
ignore_tz = False,
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust = True,
# attempt repair of missing data or currency mixups e.g. $/cents
repair = False,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads = True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy = None
)
```
Review the [Wiki](https://github.com/ranaroussi/yfinance/wiki) for more options and detail.
### Smarter scraping
To use a custom `requests` session (for example to cache calls to the
API or customize the `User-agent` header), pass a `session=` argument to
the Ticker constructor.
### Timezone cache store
When fetching price data, all dates are localized to stock exchange timezone.
But timezone retrieval is relatively slow, so yfinance attemps to cache them
in your users cache folder.
You can direct cache to use a different location with `set_tz_cache_location()`:
```python
import requests_cache
session = requests_cache.CachedSession('yfinance.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yf.Ticker('msft', session=session)
# The scraped response will be stored in the cache
ticker.actions
```
Combine a `requests_cache` with rate-limiting to avoid triggering Yahoo's rate-limiter/blocker that can corrupt data.
```python
from requests import Session
from requests_cache import CacheMixin, SQLiteCache
from requests_ratelimiter import LimiterMixin, MemoryQueueBucket
from pyrate_limiter import Duration, RequestRate, Limiter
class CachedLimiterSession(CacheMixin, LimiterMixin, Session):
pass
session = CachedLimiterSession(
limiter=Limiter(RequestRate(2, Duration.SECOND*5), # max 2 requests per 5 seconds
bucket_class=MemoryQueueBucket,
backend=SQLiteCache("yfinance.cache"),
)
import yfinance as yf
yf.set_tz_cache_location("custom/cache/location")
...
```
### Managing Multi-Level Columns
@@ -232,7 +269,9 @@ yfinance?](https://stackoverflow.com/questions/63107801)
- How to download single or multiple tickers into a single
dataframe with single level column names and a ticker column
### `pandas_datareader` override
---
## `pandas_datareader` override
If your code uses `pandas_datareader` and you want to download data
faster, you can "hijack" `pandas_datareader.data.get_data_yahoo()`
@@ -249,18 +288,6 @@ yf.pdr_override() # <== that's all it takes :-)
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")
```
### Timezone cache store
When fetching price data, all dates are localized to stock exchange timezone.
But timezone retrieval is relatively slow, so yfinance attemps to cache them
in your users cache folder.
You can direct cache to use a different location with `set_tz_cache_location()`:
```python
import yfinance as yf
yf.set_tz_cache_location("custom/cache/location")
...
```
---
## Installation
@@ -288,15 +315,11 @@ To install `yfinance` using `conda`, see
- [html5lib](https://pypi.org/project/html5lib) \>= 1.1
- [cryptography](https://pypi.org/project/cryptography) \>= 3.3.2
#### Optional (if you want to use `pandas_datareader`)
### Optional (if you want to use `pandas_datareader`)
- [pandas\_datareader](https://github.com/pydata/pandas-datareader)
\>= 0.4.0
## Developers: want to contribute?
`yfinance` relies on community to investigate bugs and contribute code. Developer guide: https://github.com/ranaroussi/yfinance/discussions/1084
---
### Legal Stuff

View File

@@ -1,5 +1,5 @@
{% set name = "yfinance" %}
{% set version = "0.2.15" %}
{% set version = "0.2.3" %}
package:
name: "{{ name|lower }}"

View File

@@ -24,7 +24,9 @@ class TestPriceHistory(unittest.TestCase):
def test_daily_index(self):
tkrs = ["BHP.AX", "IMP.JO", "BP.L", "PNL.L", "INTC"]
intervals = ["1d", "1wk", "1mo"]
for tkr in tkrs:
dat = yf.Ticker(tkr, session=self.session)
@@ -42,8 +44,8 @@ class TestPriceHistory(unittest.TestCase):
dt_utc = _tz.timezone("UTC").localize(_dt.datetime.utcnow())
dt = dt_utc.astimezone(_tz.timezone(tz))
start_d = dt.date() - _dt.timedelta(days=7)
df = dat.history(start=start_d, interval="1h")
df = dat.history(start=dt.date() - _dt.timedelta(days=1), interval="1h")
dt0 = df.index[-2]
dt1 = df.index[-1]
@@ -53,6 +55,7 @@ class TestPriceHistory(unittest.TestCase):
print("Ticker = ", tkr)
raise
def test_duplicatingDaily(self):
tkrs = ["IMP.JO", "BHG.JO", "SSW.JO", "BP.L", "INTC"]
test_run = False
@@ -107,27 +110,22 @@ class TestPriceHistory(unittest.TestCase):
def test_intraDayWithEvents(self):
# TASE dividend release pre-market, doesn't merge nicely with intra-day data so check still present
tase_tkrs = ["ICL.TA", "ESLT.TA", "ONE.TA", "MGDL.TA"]
test_run = False
for tkr in tase_tkrs:
start_d = _dt.date.today() - _dt.timedelta(days=59)
end_d = None
df_daily = yf.Ticker(tkr, session=self.session).history(start=start_d, end=end_d, interval="1d", actions=True)
df_daily_divs = df_daily["Dividends"][df_daily["Dividends"] != 0]
if df_daily_divs.shape[0] == 0:
# self.skipTest("Skipping test_intraDayWithEvents() because 'ICL.TA' has no dividend in last 60 days")
continue
tkr = "ICL.TA"
# tkr = "ESLT.TA"
# tkr = "ONE.TA"
# tkr = "MGDL.TA"
start_d = _dt.date.today() - _dt.timedelta(days=60)
end_d = None
df_daily = yf.Ticker(tkr, session=self.session).history(start=start_d, end=end_d, interval="1d", actions=True)
df_daily_divs = df_daily["Dividends"][df_daily["Dividends"] != 0]
if df_daily_divs.shape[0] == 0:
self.skipTest("Skipping test_intraDayWithEvents() because 'ICL.TA' has no dividend in last 60 days")
last_div_date = df_daily_divs.index[-1]
start_d = last_div_date.date()
end_d = last_div_date.date() + _dt.timedelta(days=1)
df = yf.Ticker(tkr, session=self.session).history(start=start_d, end=end_d, interval="15m", actions=True)
self.assertTrue((df["Dividends"] != 0.0).any())
test_run = True
break
if not test_run:
self.skipTest("Skipping test_intraDayWithEvents() because no tickers had a dividend in last 60 days")
last_div_date = df_daily_divs.index[-1]
start_d = last_div_date.date()
end_d = last_div_date.date() + _dt.timedelta(days=1)
df = yf.Ticker(tkr, session=self.session).history(start=start_d, end=end_d, interval="15m", actions=True)
self.assertTrue((df["Dividends"] != 0.0).any())
def test_dailyWithEvents(self):
# Reproduce issue #521
@@ -230,13 +228,9 @@ class TestPriceHistory(unittest.TestCase):
print("{}-without-events missing these dates: {}".format(tkr, missing_from_df2))
raise
def test_monthlyWithEvents2(self):
# Simply check no exception from internal merge
tkr = "ABBV"
yf.Ticker("ABBV").history(period="max", interval="1mo")
def test_tz_dst_ambiguous(self):
# Reproduce issue #1100
try:
yf.Ticker("ESLT.TA", session=self.session).history(start="2002-10-06", end="2002-10-09", interval="1d")
except _tz.exceptions.AmbiguousTimeError:
@@ -267,116 +261,6 @@ class TestPriceHistory(unittest.TestCase):
print("Weekly data not aligned to Monday")
raise
def test_prune_post_intraday_us(self):
# Half-day before USA Thanksgiving. Yahoo normally
# returns an interval starting when regular trading closes,
# even if prepost=False.
# Setup
tkr = "AMZN"
interval = "1h"
interval_td = _dt.timedelta(hours=1)
time_open = _dt.time(9, 30)
time_close = _dt.time(16)
special_day = _dt.date(2022, 11, 25)
time_early_close = _dt.time(13)
dat = yf.Ticker(tkr, session=self.session)
# Run
start_d = special_day - _dt.timedelta(days=7)
end_d = special_day + _dt.timedelta(days=7)
df = dat.history(start=start_d, end=end_d, interval=interval, prepost=False, keepna=True)
tg_last_dt = df.loc[str(special_day)].index[-1]
self.assertTrue(tg_last_dt.time() < time_early_close)
# Test no other afternoons (or mornings) were pruned
start_d = _dt.date(special_day.year, 1, 1)
end_d = _dt.date(special_day.year+1, 1, 1)
df = dat.history(start=start_d, end=end_d, interval="1h", prepost=False, keepna=True)
last_dts = _pd.Series(df.index).groupby(df.index.date).last()
f_early_close = (last_dts+interval_td).dt.time < time_close
early_close_dates = last_dts.index[f_early_close].values
self.assertEqual(len(early_close_dates), 1)
self.assertEqual(early_close_dates[0], special_day)
first_dts = _pd.Series(df.index).groupby(df.index.date).first()
f_late_open = first_dts.dt.time > time_open
late_open_dates = first_dts.index[f_late_open]
self.assertEqual(len(late_open_dates), 0)
def test_prune_post_intraday_omx(self):
# Half-day before Sweden Christmas. Yahoo normally
# returns an interval starting when regular trading closes,
# even if prepost=False.
# If prepost=False, test that yfinance is removing prepost intervals.
# Setup
tkr = "AEC.ST"
interval = "1h"
interval_td = _dt.timedelta(hours=1)
time_open = _dt.time(9)
time_close = _dt.time(17,30)
special_day = _dt.date(2022, 12, 23)
time_early_close = _dt.time(13, 2)
dat = yf.Ticker(tkr, session=self.session)
# Half trading day Jan 5, Apr 14, May 25, Jun 23, Nov 4, Dec 23, Dec 30
half_days = [_dt.date(special_day.year, x[0], x[1]) for x in [(1,5), (4,14), (5,25), (6,23), (11,4), (12,23), (12,30)]]
# Yahoo has incorrectly classified afternoon of 2022-04-13 as post-market.
# Nothing yfinance can do because Yahoo doesn't return data with prepost=False.
# But need to handle in this test.
expected_incorrect_half_days = [_dt.date(2022,4,13)]
half_days = sorted(half_days+expected_incorrect_half_days)
# Run
start_d = special_day - _dt.timedelta(days=7)
end_d = special_day + _dt.timedelta(days=7)
df = dat.history(start=start_d, end=end_d, interval=interval, prepost=False, keepna=True)
tg_last_dt = df.loc[str(special_day)].index[-1]
self.assertTrue(tg_last_dt.time() < time_early_close)
# Test no other afternoons (or mornings) were pruned
start_d = _dt.date(special_day.year, 1, 1)
end_d = _dt.date(special_day.year+1, 1, 1)
df = dat.history(start=start_d, end=end_d, interval="1h", prepost=False, keepna=True)
last_dts = _pd.Series(df.index).groupby(df.index.date).last()
f_early_close = (last_dts+interval_td).dt.time < time_close
early_close_dates = last_dts.index[f_early_close].values
unexpected_early_close_dates = [d for d in early_close_dates if not d in half_days]
self.assertEqual(len(unexpected_early_close_dates), 0)
self.assertEqual(len(early_close_dates), len(half_days))
self.assertTrue(_np.equal(early_close_dates, half_days).all())
first_dts = _pd.Series(df.index).groupby(df.index.date).first()
f_late_open = first_dts.dt.time > time_open
late_open_dates = first_dts.index[f_late_open]
self.assertEqual(len(late_open_dates), 0)
def test_prune_post_intraday_asx(self):
# Setup
tkr = "BHP.AX"
interval = "1h"
interval_td = _dt.timedelta(hours=1)
time_open = _dt.time(10)
time_close = _dt.time(16,12)
# No early closes in 2022
dat = yf.Ticker(tkr, session=self.session)
# Test no afternoons (or mornings) were pruned
start_d = _dt.date(2022, 1, 1)
end_d = _dt.date(2022+1, 1, 1)
df = dat.history(start=start_d, end=end_d, interval="1h", prepost=False, keepna=True)
last_dts = _pd.Series(df.index).groupby(df.index.date).last()
f_early_close = (last_dts+interval_td).dt.time < time_close
early_close_dates = last_dts.index[f_early_close].values
self.assertEqual(len(early_close_dates), 0)
first_dts = _pd.Series(df.index).groupby(df.index.date).first()
f_late_open = first_dts.dt.time > time_open
late_open_dates = first_dts.index[f_late_open]
self.assertEqual(len(late_open_dates), 0)
def test_weekly_2rows_fix(self):
tkr = "AMZN"
start = _dt.date.today() - _dt.timedelta(days=14)
@@ -386,53 +270,11 @@ class TestPriceHistory(unittest.TestCase):
df = dat.history(start=start, interval="1wk")
self.assertTrue((df.index.weekday == 0).all())
def test_aggregate_capital_gains(self):
# Setup
tkr = "FXAIX"
dat = yf.Ticker(tkr, session=self.session)
start = "2017-12-31"
end = "2019-12-31"
interval = "3mo"
df = dat.history(start=start, end=end, interval=interval)
class TestPriceRepair(unittest.TestCase):
session = None
@classmethod
def setUpClass(cls):
cls.session = requests_cache.CachedSession(backend='memory')
@classmethod
def tearDownClass(cls):
if cls.session is not None:
cls.session.close()
def test_reconstruct_2m(self):
# 2m repair requires 1m data.
# Yahoo restricts 1m fetches to 7 days max within last 30 days.
# Need to test that '_reconstruct_intervals_batch()' can handle this.
tkrs = ["BHP.AX", "IMP.JO", "BP.L", "PNL.L", "INTC"]
dt_now = _pd.Timestamp.utcnow()
td_7d = _dt.timedelta(days=7)
td_60d = _dt.timedelta(days=60)
# Round time for 'requests_cache' reuse
dt_now = dt_now.ceil("1h")
for tkr in tkrs:
dat = yf.Ticker(tkr, session=self.session)
end_dt = dt_now
start_dt = end_dt - td_60d
df = dat.history(start=start_dt, end=end_dt, interval="2m", repair=True)
def test_repair_100x_weekly(self):
# Setup:
tkr = "PNL.L"
dat = yf.Ticker(tkr, session=self.session)
tz_exchange = dat.fast_info["timezone"]
tz_exchange = dat.info["exchangeTimezoneName"]
data_cols = ["Low", "High", "Open", "Close", "Adj Close"]
df = _pd.DataFrame(data={"Open": [470.5, 473.5, 474.5, 470],
@@ -441,22 +283,22 @@ class TestPriceRepair(unittest.TestCase):
"Close": [475, 473.5, 472, 473.5],
"Adj Close": [475, 473.5, 472, 473.5],
"Volume": [2295613, 2245604, 3000287, 2635611]},
index=_pd.to_datetime([_dt.date(2022, 10, 24),
_dt.date(2022, 10, 17),
_dt.date(2022, 10, 10),
_dt.date(2022, 10, 3)]))
index=_pd.to_datetime([_dt.date(2022, 10, 23),
_dt.date(2022, 10, 16),
_dt.date(2022, 10, 9),
_dt.date(2022, 10, 2)]))
df = df.sort_index()
df.index.name = "Date"
df_bad = df.copy()
df_bad.loc["2022-10-24", "Close"] *= 100
df_bad.loc["2022-10-17", "Low"] *= 100
df_bad.loc["2022-10-03", "Open"] *= 100
df_bad.loc["2022-10-23", "Close"] *= 100
df_bad.loc["2022-10-16", "Low"] *= 100
df_bad.loc["2022-10-2", "Open"] *= 100
df.index = df.index.tz_localize(tz_exchange)
df_bad.index = df_bad.index.tz_localize(tz_exchange)
# Run test
df_repaired = dat._fix_unit_mixups(df_bad, "1wk", tz_exchange, prepost=False)
df_repaired = dat._fix_unit_mixups(df_bad, "1wk", tz_exchange)
# First test - no errors left
for c in data_cols:
@@ -484,7 +326,7 @@ class TestPriceRepair(unittest.TestCase):
tkr = "PNL.L"
dat = yf.Ticker(tkr, session=self.session)
tz_exchange = dat.fast_info["timezone"]
tz_exchange = dat.info["exchangeTimezoneName"]
data_cols = ["Low", "High", "Open", "Close", "Adj Close"]
df = _pd.DataFrame(data={"Open": [400, 398, 392.5, 417],
@@ -511,7 +353,7 @@ class TestPriceRepair(unittest.TestCase):
df.index = df.index.tz_localize(tz_exchange)
df_bad.index = df_bad.index.tz_localize(tz_exchange)
df_repaired = dat._fix_unit_mixups(df_bad, "1wk", tz_exchange, prepost=False)
df_repaired = dat._fix_unit_mixups(df_bad, "1wk", tz_exchange)
# First test - no errors left
for c in data_cols:
@@ -539,7 +381,7 @@ class TestPriceRepair(unittest.TestCase):
def test_repair_100x_daily(self):
tkr = "PNL.L"
dat = yf.Ticker(tkr, session=self.session)
tz_exchange = dat.fast_info["timezone"]
tz_exchange = dat.info["exchangeTimezoneName"]
data_cols = ["Low", "High", "Open", "Close", "Adj Close"]
df = _pd.DataFrame(data={"Open": [478, 476, 476, 472],
@@ -561,7 +403,7 @@ class TestPriceRepair(unittest.TestCase):
df.index = df.index.tz_localize(tz_exchange)
df_bad.index = df_bad.index.tz_localize(tz_exchange)
df_repaired = dat._fix_unit_mixups(df_bad, "1d", tz_exchange, prepost=False)
df_repaired = dat._fix_unit_mixups(df_bad, "1d", tz_exchange)
# First test - no errors left
for c in data_cols:
@@ -581,7 +423,7 @@ class TestPriceRepair(unittest.TestCase):
def test_repair_zeroes_daily(self):
tkr = "BBIL.L"
dat = yf.Ticker(tkr, session=self.session)
tz_exchange = dat.fast_info["timezone"]
tz_exchange = dat.info["exchangeTimezoneName"]
df_bad = _pd.DataFrame(data={"Open": [0, 102.04, 102.04],
"High": [0, 102.1, 102.11],
@@ -596,7 +438,7 @@ class TestPriceRepair(unittest.TestCase):
df_bad.index.name = "Date"
df_bad.index = df_bad.index.tz_localize(tz_exchange)
repaired_df = dat._fix_zeroes(df_bad, "1d", tz_exchange, prepost=False)
repaired_df = dat._fix_zeroes(df_bad, "1d", tz_exchange)
correct_df = df_bad.copy()
correct_df.loc["2022-11-01", "Open"] = 102.080002
@@ -608,31 +450,40 @@ class TestPriceRepair(unittest.TestCase):
def test_repair_zeroes_hourly(self):
tkr = "INTC"
dat = yf.Ticker(tkr, session=self.session)
tz_exchange = dat.fast_info["timezone"]
tz_exchange = dat.info["exchangeTimezoneName"]
correct_df = dat.history(period="1wk", interval="1h", auto_adjust=False, repair=True)
df_bad = _pd.DataFrame(data={"Open": [29.68, 29.49, 29.545, _np.nan, 29.485],
"High": [29.68, 29.625, 29.58, _np.nan, 29.49],
"Low": [29.46, 29.4, 29.45, _np.nan, 29.31],
"Close": [29.485, 29.545, 29.485, _np.nan, 29.325],
"Adj Close": [29.485, 29.545, 29.485, _np.nan, 29.325],
"Volume": [3258528, 2140195, 1621010, 0, 0]},
index=_pd.to_datetime([_dt.datetime(2022,11,25, 9,30),
_dt.datetime(2022,11,25, 10,30),
_dt.datetime(2022,11,25, 11,30),
_dt.datetime(2022,11,25, 12,30),
_dt.datetime(2022,11,25, 13,00)]))
df_bad = df_bad.sort_index()
df_bad.index.name = "Date"
df_bad.index = df_bad.index.tz_localize(tz_exchange)
df_bad = correct_df.copy()
bad_idx = correct_df.index[10]
df_bad.loc[bad_idx, "Open"] = _np.nan
df_bad.loc[bad_idx, "High"] = _np.nan
df_bad.loc[bad_idx, "Low"] = _np.nan
df_bad.loc[bad_idx, "Close"] = _np.nan
df_bad.loc[bad_idx, "Adj Close"] = _np.nan
df_bad.loc[bad_idx, "Volume"] = 0
repaired_df = dat._fix_zeroes(df_bad, "1h", tz_exchange, prepost=False)
repaired_df = dat._fix_zeroes(df_bad, "1h", tz_exchange)
correct_df = df_bad.copy()
idx = _pd.Timestamp(2022,11,25, 12,30).tz_localize(tz_exchange)
correct_df.loc[idx, "Open"] = 29.485001
correct_df.loc[idx, "High"] = 29.49
correct_df.loc[idx, "Low"] = 29.43
correct_df.loc[idx, "Close"] = 29.455
correct_df.loc[idx, "Adj Close"] = 29.455
correct_df.loc[idx, "Volume"] = 609164
for c in ["Open", "Low", "High", "Close"]:
try:
self.assertTrue(_np.isclose(repaired_df[c], correct_df[c], rtol=1e-7).all())
except:
print("COLUMN", c)
print("- repaired_df")
print(repaired_df)
print("- correct_df[c]:")
print(correct_df[c])
print("- diff:")
print(repaired_df[c] - correct_df[c])
raise

View File

@@ -9,7 +9,6 @@ Specific test class:
"""
import pandas as pd
import numpy as np
from .context import yfinance as yf
@@ -52,16 +51,12 @@ class TestTicker(unittest.TestCase):
def test_badTicker(self):
# Check yfinance doesn't die when ticker delisted
tkr = "DJI" # typo of "^DJI"
tkr = "AM2Z.TA"
dat = yf.Ticker(tkr, session=self.session)
dat.history(period="1wk")
dat.history(start="2022-01-01")
dat.history(start="2022-01-01", end="2022-03-01")
yf.download([tkr], period="1wk")
for k in dat.fast_info:
dat.fast_info[k]
dat.isin
dat.major_holders
dat.institutional_holders
@@ -95,48 +90,43 @@ class TestTicker(unittest.TestCase):
def test_goodTicker(self):
# that yfinance works when full api is called on same instance of ticker
tkrs = ["IBM"]
tkrs.append("QCSTIX") # weird ticker, no price history but has previous close
for tkr in tkrs:
dat = yf.Ticker(tkr, session=self.session)
tkr = "IBM"
dat = yf.Ticker(tkr, session=self.session)
dat.history(period="1wk")
dat.history(start="2022-01-01")
dat.history(start="2022-01-01", end="2022-03-01")
yf.download([tkr], period="1wk")
dat.isin
dat.major_holders
dat.institutional_holders
dat.mutualfund_holders
dat.dividends
dat.splits
dat.actions
dat.shares
dat.get_shares_full()
dat.info
dat.calendar
dat.recommendations
dat.earnings
dat.quarterly_earnings
dat.income_stmt
dat.quarterly_income_stmt
dat.balance_sheet
dat.quarterly_balance_sheet
dat.cashflow
dat.quarterly_cashflow
dat.recommendations_summary
dat.analyst_price_target
dat.revenue_forecasts
dat.sustainability
dat.options
dat.news
dat.earnings_trend
dat.earnings_dates
dat.earnings_forecasts
for k in dat.fast_info:
dat.fast_info[k]
dat.isin
dat.major_holders
dat.institutional_holders
dat.mutualfund_holders
dat.dividends
dat.splits
dat.actions
dat.shares
dat.get_shares_full()
dat.info
dat.calendar
dat.recommendations
dat.earnings
dat.quarterly_earnings
dat.income_stmt
dat.quarterly_income_stmt
dat.balance_sheet
dat.quarterly_balance_sheet
dat.cashflow
dat.quarterly_cashflow
dat.recommendations_summary
dat.analyst_price_target
dat.revenue_forecasts
dat.sustainability
dat.options
dat.news
dat.earnings_trend
dat.earnings_dates
dat.earnings_forecasts
dat.history(period="1wk")
dat.history(start="2022-01-01")
dat.history(start="2022-01-01", end="2022-03-01")
yf.download([tkr], period="1wk")
class TestTickerHistory(unittest.TestCase):
@@ -670,138 +660,14 @@ class TestTickerMiscFinancials(unittest.TestCase):
self.assertIsInstance(data, pd.Series, "data has wrong type")
self.assertFalse(data.empty, "data is empty")
def test_bad_freq_value_raises_exception(self):
self.assertRaises(ValueError, lambda: self.ticker.get_cashflow(freq="badarg"))
class TestTickerInfo(unittest.TestCase):
session = None
@classmethod
def setUpClass(cls):
cls.session = requests_cache.CachedSession(backend='memory')
@classmethod
def tearDownClass(cls):
if cls.session is not None:
cls.session.close()
def setUp(self):
self.symbols = []
self.symbols += ["ESLT.TA", "BP.L", "GOOGL"]
self.symbols.append("QCSTIX") # good for testing, doesn't trade
self.symbols += ["BTC-USD", "IWO", "VFINX", "^GSPC"]
self.symbols += ["SOKE.IS", "ADS.DE"] # detected bugs
self.tickers = [yf.Ticker(s, session=self.session) for s in self.symbols]
def tearDown(self):
self.ticker = None
def test_info(self):
data = self.tickers[0].info
data = self.ticker.info
self.assertIsInstance(data, dict, "data has wrong type")
self.assertIn("symbol", data.keys(), "Did not find expected key in info dict")
self.assertEqual(self.symbols[0], data["symbol"], "Wrong symbol value in info dict")
def test_fast_info(self):
yf.scrapers.quote.PRUNE_INFO = False
fast_info_keys = set()
for ticker in self.tickers:
fast_info_keys.update(set(ticker.fast_info.keys()))
fast_info_keys = sorted(list(fast_info_keys))
key_rename_map = {}
key_rename_map["currency"] = "currency"
key_rename_map["quote_type"] = "quoteType"
key_rename_map["timezone"] = "exchangeTimezoneName"
key_rename_map["last_price"] = ["currentPrice", "regularMarketPrice"]
key_rename_map["open"] = ["open", "regularMarketOpen"]
key_rename_map["day_high"] = ["dayHigh", "regularMarketDayHigh"]
key_rename_map["day_low"] = ["dayLow", "regularMarketDayLow"]
key_rename_map["previous_close"] = ["previousClose"]
key_rename_map["regular_market_previous_close"] = ["regularMarketPreviousClose"]
key_rename_map["fifty_day_average"] = "fiftyDayAverage"
key_rename_map["two_hundred_day_average"] = "twoHundredDayAverage"
key_rename_map["year_change"] = ["52WeekChange", "fiftyTwoWeekChange"]
key_rename_map["year_high"] = "fiftyTwoWeekHigh"
key_rename_map["year_low"] = "fiftyTwoWeekLow"
key_rename_map["last_volume"] = ["volume", "regularMarketVolume"]
key_rename_map["ten_day_average_volume"] = ["averageVolume10days", "averageDailyVolume10Day"]
key_rename_map["three_month_average_volume"] = "averageVolume"
key_rename_map["market_cap"] = "marketCap"
key_rename_map["shares"] = "sharesOutstanding"
for k in list(key_rename_map.keys()):
if '_' in k:
key_rename_map[yf.utils.snake_case_2_camelCase(k)] = key_rename_map[k]
# Note: share count items in info[] are bad. Sometimes the float > outstanding!
# So often fast_info["shares"] does not match.
# Why isn't fast_info["shares"] wrong? Because using it to calculate market cap always correct.
bad_keys = {"shares"}
# Loose tolerance for averages, no idea why don't match info[]. Is info wrong?
custom_tolerances = {}
custom_tolerances["year_change"] = 1.0
# custom_tolerances["ten_day_average_volume"] = 1e-3
custom_tolerances["ten_day_average_volume"] = 1e-1
# custom_tolerances["three_month_average_volume"] = 1e-2
custom_tolerances["three_month_average_volume"] = 5e-1
custom_tolerances["fifty_day_average"] = 1e-2
custom_tolerances["two_hundred_day_average"] = 1e-2
for k in list(custom_tolerances.keys()):
if '_' in k:
custom_tolerances[yf.utils.snake_case_2_camelCase(k)] = custom_tolerances[k]
for k in fast_info_keys:
if k in key_rename_map:
k2 = key_rename_map[k]
else:
k2 = k
if not isinstance(k2, list):
k2 = [k2]
for m in k2:
for ticker in self.tickers:
if not m in ticker.info:
# print(f"symbol={ticker.ticker}: fast_info key '{k}' mapped to info key '{m}' but not present in info")
continue
if k in bad_keys:
continue
if k in custom_tolerances:
rtol = custom_tolerances[k]
else:
rtol = 5e-3
# rtol = 1e-4
correct = ticker.info[m]
test = ticker.fast_info[k]
# print(f"Testing: symbol={ticker.ticker} m={m} k={k}: test={test} vs correct={correct}")
if k in ["market_cap","marketCap"] and ticker.fast_info["currency"] in ["GBp", "ILA"]:
# Adjust for currency to match Yahoo:
test *= 0.01
try:
if correct is None:
self.assertTrue(test is None or (not np.isnan(test)), f"{k}: {test} must be None or real value because correct={correct}")
elif isinstance(test, float) or isinstance(correct, int):
self.assertTrue(np.isclose(test, correct, rtol=rtol), f"{ticker.ticker} {k}: {test} != {correct}")
else:
self.assertEqual(test, correct, f"{k}: {test} != {correct}")
except:
if k in ["regularMarketPreviousClose"] and ticker.ticker in ["ADS.DE"]:
# Yahoo is wrong, is returning post-market close not regular
continue
else:
raise
self.assertEqual("GOOGL", data["symbol"], "Wrong symbol value in info dict")
def test_bad_freq_value_raises_exception(self):
self.assertRaises(ValueError, lambda: self.ticker.get_cashflow(freq="badarg"))
def suite():
@@ -811,7 +677,6 @@ def suite():
suite.addTest(TestTickerHolders('Test holders'))
suite.addTest(TestTickerHistory('Test Ticker history'))
suite.addTest(TestTickerMiscFinancials('Test misc financials'))
suite.addTest(TestTickerInfo('Test info & fast_info'))
return suite

30
tests/utils.py Normal file
View File

@@ -0,0 +1,30 @@
import unittest
from .context import yfinance as yf
import datetime as _dt
import pandas as _pd
import pytz as _pytz
class TestUtils(unittest.TestCase):
def test_parse_user_dt(self):
""" Purpose of _parse_user_dt() is to take any date-like value,
combine with specified timezone and
return its localized timestamp.
"""
tz_name = "America/New_York"
tz = _pytz.timezone(tz_name)
dt_answer = tz.localize(_dt.datetime(2023,1,1))
# All possible versions of 'dt_answer'
values = ["2023-01-01", _dt.date(2023,1,1), _dt.datetime(2023,1,1), _pd.Timestamp(_dt.date(2023,1,1))]
# - now add localized versions
values.append(tz.localize(_dt.datetime(2023,1,1)))
values.append(_pd.Timestamp(_dt.date(2023,1,1)).tz_localize(tz_name))
values.append(int(_pd.Timestamp(_dt.date(2023,1,1)).tz_localize(tz_name).timestamp()))
for v in values:
v2 = yf.utils._parse_user_dt(v, tz_name)
self.assertEqual(v2, dt_answer.timestamp())

View File

@@ -23,7 +23,6 @@ from __future__ import print_function
import time as _time
import datetime as _datetime
import dateutil as _dateutil
from typing import Optional
import pandas as _pd
@@ -40,7 +39,7 @@ from . import shared
from .scrapers.analysis import Analysis
from .scrapers.fundamentals import Fundamentals
from .scrapers.holders import Holders
from .scrapers.quote import Quote, FastInfo
from .scrapers.quote import Quote
import json as _json
_BASE_URL_ = 'https://query2.finance.yahoo.com'
@@ -78,8 +77,6 @@ class TickerBase:
self._quote = Quote(self._data)
self._fundamentals = Fundamentals(self._data)
self._fast_info = FastInfo(self)
def stats(self, proxy=None):
ticker_url = "{}/{}".format(self._scrape_url, self.ticker)
@@ -101,13 +98,11 @@ class TickerBase:
Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
Intraday data cannot extend last 60 days
start: str
Download start date string (YYYY-MM-DD) or _datetime, inclusive.
Download start date string (YYYY-MM-DD) or _datetime.
Default is 1900-01-01
E.g. for start="2020-01-01", the first data point will be on "2020-01-01"
end: str
Download end date string (YYYY-MM-DD) or _datetime, exclusive.
Download end date string (YYYY-MM-DD) or _datetime.
Default is now
E.g. for end="2023-01-01", the last data point will be on "2022-12-31"
prepost : bool
Include Pre and Post market data in results?
Default is False
@@ -115,9 +110,8 @@ class TickerBase:
Adjust all OHLC automatically? Default is True
back_adjust: bool
Back-adjusted data to mimic true historical prices
repair: bool or "silent"
Detect currency unit 100x mixups and attempt repair.
If True, fix & print summary. If "silent", just fix.
repair: bool
Detect currency unit 100x mixups and attempt repair
Default is False
keepna: bool
Keep NaN rows returned by Yahoo?
@@ -220,7 +214,6 @@ class TickerBase:
self._history_metadata = data["chart"]["result"][0]["meta"]
except Exception:
self._history_metadata = {}
self._history_metadata = utils.format_history_metadata(self._history_metadata)
err_msg = "No data found for this date range, symbol may be delisted"
fail = False
@@ -297,37 +290,36 @@ class TickerBase:
quotes = utils.set_df_tz(quotes, params["interval"], tz_exchange)
quotes = utils.fix_Yahoo_dst_issue(quotes, params["interval"])
quotes = utils.fix_Yahoo_returning_live_separate(quotes, params["interval"], tz_exchange)
intraday = params["interval"][-1] in ("m", 'h')
if not prepost and intraday and "tradingPeriods" in self._history_metadata:
quotes = utils.fix_Yahoo_returning_prepost_unrequested(quotes, params["interval"], self._history_metadata)
# actions
dividends, splits, capital_gains = utils.parse_actions(data["chart"]["result"][0])
if not expect_capital_gains:
capital_gains = None
if start is not None:
# Note: use pandas Timestamp as datetime.utcfromtimestamp has bugs on windows
# https://github.com/python/cpython/issues/81708
startDt = _pd.Timestamp(start, unit='s')
if dividends is not None:
dividends = dividends[dividends.index>=startDt]
if capital_gains is not None:
capital_gains = capital_gains[capital_gains.index>=startDt]
if splits is not None:
splits = splits[splits.index >= startDt]
if end is not None:
endDt = _pd.Timestamp(end, unit='s')
if dividends is not None:
dividends = dividends[dividends.index<endDt]
if capital_gains is not None:
capital_gains = capital_gains[capital_gains.index<endDt]
if splits is not None:
splits = splits[splits.index < endDt]
if splits is not None:
splits = utils.set_df_tz(splits, interval, tz_exchange)
if dividends is not None:
dividends = utils.set_df_tz(dividends, interval, tz_exchange)
if capital_gains is not None:
capital_gains = utils.set_df_tz(capital_gains, interval, tz_exchange)
if start is not None:
startDt = quotes.index[0].floor('D')
if dividends is not None:
dividends = dividends.loc[startDt:]
if capital_gains is not None:
capital_gains = capital_gains.loc[startDt:]
if splits is not None:
splits = splits.loc[startDt:]
if end is not None:
endDt = _pd.Timestamp(end, unit='s').tz_localize(tz)
if dividends is not None:
dividends = dividends[dividends.index < endDt]
if capital_gains is not None:
capital_gains = capital_gains[capital_gains.index < endDt]
if splits is not None:
splits = splits[splits.index < endDt]
# Prepare for combine
intraday = params["interval"][-1] in ("m", 'h')
@@ -362,10 +354,10 @@ class TickerBase:
else:
df["Capital Gains"] = 0.0
if repair==True or repair=="silent":
if repair:
# Do this before auto/back adjust
df = self._fix_zeroes(df, interval, tz_exchange, prepost, silent=(repair=="silent"))
df = self._fix_unit_mixups(df, interval, tz_exchange, prepost, silent=(repair=="silent"))
df = self._fix_zeroes(df, interval, tz_exchange)
df = self._fix_unit_mixups(df, interval, tz_exchange)
# Auto/back adjust
try:
@@ -409,40 +401,31 @@ class TickerBase:
# ------------------------
def _reconstruct_intervals_batch(self, df, interval, prepost, tag=-1, silent=False):
def _reconstruct_intervals_batch(self, df, interval, tag=-1):
if not isinstance(df, _pd.DataFrame):
raise Exception("'df' must be a Pandas DataFrame not", type(df))
if interval == "1m":
# Can't go smaller than 1m so can't reconstruct
return df
# Reconstruct values in df using finer-grained price data. Delimiter marks what to reconstruct
debug = False
# debug = True
if interval[1:] in ['d', 'wk', 'mo']:
# Interday data always includes pre & post
prepost = True
intraday = False
else:
intraday = True
price_cols = [c for c in ["Open", "High", "Low", "Close", "Adj Close"] if c in df]
data_cols = price_cols + ["Volume"]
# If interval is weekly then can construct with daily. But if smaller intervals then
# restricted to recent times:
intervals = ["1wk", "1d", "1h", "30m", "15m", "5m", "2m", "1m"]
itds = {i:utils._interval_to_timedelta(interval) for i in intervals}
nexts = {intervals[i]:intervals[i+1] for i in range(len(intervals)-1)}
min_lookbacks = {"1wk":None, "1d":None, "1h":_datetime.timedelta(days=730)}
for i in ["30m", "15m", "5m", "2m"]:
min_lookbacks[i] = _datetime.timedelta(days=60)
min_lookbacks["1m"] = _datetime.timedelta(days=30)
if interval in nexts:
sub_interval = nexts[interval]
td_range = itds[interval]
# - daily = hourly restricted to last 730 days
sub_interval = None
td_range = None
if interval == "1wk":
# Correct by fetching week of daily data
sub_interval = "1d"
td_range = _datetime.timedelta(days=7)
elif interval == "1d":
# Correct by fetching day of hourly data
sub_interval = "1h"
td_range = _datetime.timedelta(days=1)
elif interval == "1h":
sub_interval = "30m"
td_range = _datetime.timedelta(hours=1)
else:
print("WARNING: Have not implemented repair for '{}' interval. Contact developers".format(interval))
raise Exception("why here")
@@ -454,107 +437,76 @@ class TickerBase:
f_repair_rows = f_repair.any(axis=1)
# Ignore old intervals for which Yahoo won't return finer data:
m = min_lookbacks[sub_interval]
if m is None:
min_dt = None
else:
m -= _datetime.timedelta(days=1) # allow space for 1-day padding
min_dt = _pd.Timestamp.utcnow() - m
min_dt = min_dt.tz_convert(df.index.tz).ceil("D")
if debug:
print(f"- min_dt={min_dt} interval={interval} sub_interval={sub_interval}")
if min_dt is not None:
f_recent = df.index >= min_dt
if sub_interval == "1h":
f_recent = _datetime.date.today() - df.index.date < _datetime.timedelta(days=730)
f_repair_rows = f_repair_rows & f_recent
if not f_repair_rows.any():
if debug:
print("data too old to repair")
return df
elif sub_interval in ["30m", "15m"]:
f_recent = _datetime.date.today() - df.index.date < _datetime.timedelta(days=60)
f_repair_rows = f_repair_rows & f_recent
if not f_repair_rows.any():
print("data too old to fix")
return df
dts_to_repair = df.index[f_repair_rows]
indices_to_repair = _np.where(f_repair_rows)[0]
if len(dts_to_repair) == 0:
if debug:
print("dts_to_repair[] is empty")
return df
df_v2 = df.copy()
f_good = ~(df[price_cols].isna().any(axis=1))
f_good = f_good & (df[price_cols].to_numpy()!=tag).all(axis=1)
df_good = df[f_good]
df_noNa = df[~df[price_cols].isna().any(axis=1)]
# Group nearby NaN-intervals together to reduce number of Yahoo fetches
dts_groups = [[dts_to_repair[0]]]
last_dt = dts_to_repair[0]
last_ind = indices_to_repair[0]
td = utils._interval_to_timedelta(interval)
# Note on setting max size: have to allow space for adding good data
if sub_interval == "1mo":
grp_max_size = _dateutil.relativedelta.relativedelta(years=2)
elif sub_interval == "1wk":
grp_max_size = _dateutil.relativedelta.relativedelta(years=2)
elif sub_interval == "1d":
grp_max_size = _dateutil.relativedelta.relativedelta(years=2)
elif sub_interval == "1h":
grp_max_size = _dateutil.relativedelta.relativedelta(years=1)
elif sub_interval == "1m":
grp_max_size = _datetime.timedelta(days=5) # allow 2 days for buffer below
if interval == "1mo":
grp_td_threshold = _datetime.timedelta(days=28)
elif interval == "1wk":
grp_td_threshold = _datetime.timedelta(days=28)
elif interval == "1d":
grp_td_threshold = _datetime.timedelta(days=14)
elif interval == "1h":
grp_td_threshold = _datetime.timedelta(days=7)
else:
grp_max_size = _datetime.timedelta(days=30)
if debug:
print("- grp_max_size =", grp_max_size)
grp_td_threshold = _datetime.timedelta(days=2)
# grp_td_threshold = _datetime.timedelta(days=7)
for i in range(1, len(dts_to_repair)):
ind = indices_to_repair[i]
dt = dts_to_repair[i]
if dt.date() < dts_groups[-1][0].date()+grp_max_size:
if (dt-dts_groups[-1][-1]) < grp_td_threshold:
dts_groups[-1].append(dt)
elif ind - last_ind <= 3:
dts_groups[-1].append(dt)
else:
dts_groups.append([dt])
last_dt = dt
last_ind = ind
if debug:
print("Repair groups:")
for g in dts_groups:
print(f"- {g[0]} -> {g[-1]}")
# Add some good data to each group, so can calibrate later:
for i in range(len(dts_groups)):
g = dts_groups[i]
g0 = g[0]
i0 = df_good.index.get_indexer([g0], method="nearest")[0]
i0 = df_noNa.index.get_loc(g0)
if i0 > 0:
if (min_dt is None or df_good.index[i0-1] >= min_dt) and \
((not intraday) or df_good.index[i0-1].date()==g0.date()):
i0 -= 1
dts_groups[i].insert(0, df_noNa.index[i0-1])
gl = g[-1]
il = df_good.index.get_indexer([gl], method="nearest")[0]
if il < len(df_good)-1:
if (not intraday) or df_good.index[il+1].date()==gl.date():
il += 1
good_dts = df_good.index[i0:il+1]
dts_groups[i] += good_dts.to_list()
dts_groups[i].sort()
il = df_noNa.index.get_loc(gl)
if il < len(df_noNa)-1:
dts_groups[i].append(df_noNa.index[il+1])
n_fixed = 0
for g in dts_groups:
df_block = df[df.index.isin(g)]
if debug:
print("- df_block:")
print(df_block)
start_dt = g[0]
start_d = start_dt.date()
if sub_interval == "1h" and (_datetime.date.today() - start_d) > _datetime.timedelta(days=729):
# Don't bother requesting more price data, Yahoo will reject
if debug:
print(f"- Don't bother requesting {sub_interval} price data, Yahoo will reject")
continue
elif sub_interval in ["30m", "15m"] and (_datetime.date.today() - start_d) > _datetime.timedelta(days=59):
# Don't bother requesting more price data, Yahoo will reject
if debug:
print(f"- Don't bother requesting {sub_interval} price data, Yahoo will reject")
continue
td_1d = _datetime.timedelta(days=1)
@@ -568,25 +520,15 @@ class TickerBase:
fetch_start = g[0]
fetch_end = g[-1] + td_range
# The first and last day returned by Yahoo can be slightly wrong, so add buffer:
fetch_start -= td_1d
fetch_end += td_1d
if intraday:
fetch_start = fetch_start.date()
fetch_end = fetch_end.date()+td_1d
if debug:
print(f"- fetching {sub_interval} prepost={prepost} {fetch_start}->{fetch_end}")
r = "silent" if silent else True
df_fine = self.history(start=fetch_start, end=fetch_end, interval=sub_interval, auto_adjust=False, actions=False, prepost=prepost, repair=r, keepna=True)
prepost = interval == "1d"
df_fine = self.history(start=fetch_start, end=fetch_end, interval=sub_interval, auto_adjust=False, prepost=prepost, repair=False, keepna=True)
if df_fine is None or df_fine.empty:
if not silent:
print("YF: WARNING: Cannot reconstruct because Yahoo not returning data in interval")
print("YF: WARNING: Cannot reconstruct because Yahoo not returning data in interval")
continue
# Discard the buffer
df_fine = df_fine.loc[g[0] : g[-1]+itds[sub_interval]-_datetime.timedelta(milliseconds=1)]
df_fine["ctr"] = 0
if interval == "1wk":
# df_fine["Week Start"] = df_fine.index.tz_localize(None).to_period("W-SUN").start_time
weekdays = ["MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN"]
week_end_day = weekdays[(df_block.index[0].weekday()+7-1)%7]
df_fine["Week Start"] = df_fine.index.tz_localize(None).to_period("W-"+week_end_day).start_time
@@ -601,8 +543,7 @@ class TickerBase:
grp_col = "intervalID"
df_fine = df_fine[~df_fine[price_cols].isna().all(axis=1)]
df_fine_grp = df_fine.groupby(grp_col)
df_new = df_fine_grp.agg(
df_new = df_fine.groupby(grp_col).agg(
Open=("Open", "first"),
Close=("Close", "last"),
AdjClose=("Adj Close", "last"),
@@ -616,42 +557,31 @@ class TickerBase:
new_index = _np.append([df_fine.index[0]], df_fine.index[df_fine["intervalID"].diff()>0])
df_new.index = new_index
if debug:
print("- df_new:")
print(df_new)
# Calibrate! Check whether 'df_fine' has different split-adjustment.
# If different, then adjust to match 'df'
common_index = _np.intersect1d(df_block.index, df_new.index)
df_block_calib = df_block[price_cols]
common_index = df_block_calib.index[df_block_calib.index.isin(df_new.index)]
if len(common_index) == 0:
# Can't calibrate so don't attempt repair
if debug:
print("Can't calibrate so don't attempt repair")
continue
df_new_calib = df_new[df_new.index.isin(common_index)][price_cols].to_numpy()
df_block_calib = df_block[df_block.index.isin(common_index)][price_cols].to_numpy()
calib_filter = (df_block_calib != tag)
df_new_calib = df_new[df_new.index.isin(common_index)][price_cols]
df_block_calib = df_block_calib[df_block_calib.index.isin(common_index)]
calib_filter = (df_block_calib != tag).to_numpy()
if not calib_filter.any():
# Can't calibrate so don't attempt repair
if debug:
print("Can't calibrate so don't attempt repair")
continue
# Avoid divide-by-zero warnings:
# Avoid divide-by-zero warnings printing:
df_new_calib = df_new_calib.to_numpy()
df_block_calib = df_block_calib.to_numpy()
for j in range(len(price_cols)):
c = price_cols[j]
f = ~calib_filter[:,j]
if f.any():
df_block_calib[f,j] = 1
df_new_calib[f,j] = 1
ratios = df_block_calib[calib_filter] / df_new_calib[calib_filter]
weights = df_fine_grp.size()
weights.index = df_new.index
weights = weights[weights.index.isin(common_index)].to_numpy().astype(float)
weights = weights[:,None] # transpose
weights = _np.tile(weights, len(price_cols)) # 1D -> 2D
weights = weights[calib_filter] # flatten
ratio = _np.average(ratios, weights=weights)
if debug:
print(f"- price calibration ratio (raw) = {ratio}")
ratios = (df_block_calib / df_new_calib)[calib_filter]
ratio = _np.mean(ratios)
#
ratio_rcp = round(1.0 / ratio, 1)
ratio = round(ratio, 1)
if ratio == 1 and ratio_rcp == 1:
@@ -670,22 +600,13 @@ class TickerBase:
df_new["Volume"] *= ratio_rcp
# Repair!
bad_dts = df_block.index[(df_block[price_cols+["Volume"]]==tag).any(axis=1)]
bad_dts = df_block.index[(df_block[price_cols]==tag).any(axis=1)]
if debug:
no_fine_data_dts = []
for idx in bad_dts:
if not idx in df_new.index:
# Yahoo didn't return finer-grain data for this interval,
# so probably no trading happened.
no_fine_data_dts.append(idx)
if len(no_fine_data_dts) > 0:
print(f"Yahoo didn't return finer-grain data for these intervals:")
print(no_fine_data_dts)
for idx in bad_dts:
if not idx in df_new.index:
# Yahoo didn't return finer-grain data for this interval,
# so probably no trading happened.
# print("no fine data")
continue
df_new_row = df_new.loc[idx]
@@ -714,12 +635,9 @@ class TickerBase:
df_v2.loc[idx, "Volume"] = df_new_row["Volume"]
n_fixed += 1
if debug:
print("df_v2:") ; print(df_v2)
return df_v2
def _fix_unit_mixups(self, df, interval, tz_exchange, prepost, silent=False):
def _fix_unit_mixups(self, df, interval, tz_exchange):
# Sometimes Yahoo returns few prices in cents/pence instead of $/£
# I.e. 100x bigger
# Easy to detect and fix, just look for outliers = ~100x local median
@@ -741,7 +659,7 @@ class TickerBase:
# adding it to dependencies.
from scipy import ndimage as _ndimage
data_cols = ["High", "Open", "Low", "Close", "Adj Close"] # Order important, separate High from Low
data_cols = ["High", "Open", "Low", "Close"] # Order important, separate High from Low
data_cols = [c for c in data_cols if c in df2.columns]
f_zeroes = (df2[data_cols]==0).any(axis=1)
if f_zeroes.any():
@@ -766,7 +684,7 @@ class TickerBase:
df2.loc[fi, c] = tag
n_before = (df2[data_cols].to_numpy()==tag).sum()
df2 = self._reconstruct_intervals_batch(df2, interval, prepost, tag, silent)
df2 = self._reconstruct_intervals_batch(df2, interval, tag=tag)
n_after = (df2[data_cols].to_numpy()==tag).sum()
if n_after > 0:
@@ -789,11 +707,6 @@ class TickerBase:
if fi[j]:
df2.loc[idx, c] = df.loc[idx, c] * 0.01
#
c = "Adj Close"
j = data_cols.index(c)
if fi[j]:
df2.loc[idx, c] = df.loc[idx, c] * 0.01
#
c = "High"
j = data_cols.index(c)
if fi[j]:
@@ -808,7 +721,7 @@ class TickerBase:
n_fixed = n_before - n_after_crude
n_fixed_crudely = n_after - n_after_crude
if not silent and n_fixed > 0:
if n_fixed > 0:
report_msg = f"{self.ticker}: fixed {n_fixed}/{n_before} currency unit mixups "
if n_fixed_crudely > 0:
report_msg += f"({n_fixed_crudely} crudely) "
@@ -828,7 +741,7 @@ class TickerBase:
return df2
def _fix_zeroes(self, df, interval, tz_exchange, prepost, silent=False):
def _fix_zeroes(self, df, interval, tz_exchange):
# Sometimes Yahoo returns prices=0 or NaN when trades occurred.
# But most times when prices=0 or NaN returned is because no trades.
# Impossible to distinguish, so only attempt repair if few or rare.
@@ -836,12 +749,6 @@ class TickerBase:
if df.shape[0] == 0:
return df
debug = False
# debug = True
intraday = interval[-1] in ("m", 'h')
df = df.sort_index() # important!
df2 = df.copy()
if df2.index.tz is None:
@@ -850,34 +757,16 @@ class TickerBase:
df2.index = df2.index.tz_convert(tz_exchange)
price_cols = [c for c in ["Open", "High", "Low", "Close", "Adj Close"] if c in df2.columns]
f_prices_bad = (df2[price_cols] == 0.0) | df2[price_cols].isna()
df2_reserve = None
if intraday:
# Ignore days with >50% intervals containing NaNs
df_nans = pd.DataFrame(f_prices_bad.any(axis=1), columns=["nan"])
df_nans["_date"] = df_nans.index.date
grp = df_nans.groupby("_date")
nan_pct = grp.sum() / grp.count()
dts = nan_pct.index[nan_pct["nan"]>0.5]
f_zero_or_nan_ignore = _np.isin(f_prices_bad.index.date, dts)
df2_reserve = df2[f_zero_or_nan_ignore]
df2 = df2[~f_zero_or_nan_ignore]
f_prices_bad = (df2[price_cols] == 0.0) | df2[price_cols].isna()
f_high_low_good = (~df2["High"].isna()) & (~df2["Low"].isna())
f_vol_bad = (df2["Volume"]==0).to_numpy() & f_high_low_good & (df2["High"]!=df2["Low"]).to_numpy()
f_zero_or_nan = (df2[price_cols] == 0.0).values | df2[price_cols].isna().values
# Check whether worth attempting repair
f_prices_bad = f_prices_bad.to_numpy()
f_bad_rows = f_prices_bad.any(axis=1) | f_vol_bad
if not f_bad_rows.any():
if debug:
print("no bad data to repair")
if f_zero_or_nan.any(axis=1).sum() == 0:
return df
if f_prices_bad.sum() == len(price_cols)*len(df2):
if f_zero_or_nan.sum() == len(price_cols)*len(df2):
# Need some good data to calibrate
if debug:
print("no good data to calibrate")
return df
# - avoid repair if many zeroes/NaNs
pct_zero_or_nan = f_zero_or_nan.sum() / (len(price_cols)*len(df2))
if f_zero_or_nan.any(axis=1).sum()>2 and pct_zero_or_nan > 0.05:
return df
data_cols = price_cols + ["Volume"]
@@ -886,31 +775,17 @@ class TickerBase:
tag = -1.0
for i in range(len(price_cols)):
c = price_cols[i]
df2.loc[f_prices_bad[:,i], c] = tag
df2.loc[f_vol_bad, "Volume"] = tag
df2.loc[f_zero_or_nan[:,i], c] = tag
# If volume=0 or NaN for bad prices, then tag volume for repair
f_vol_zero_or_nan = (df2["Volume"].to_numpy()==0) | (df2["Volume"].isna().to_numpy())
df2.loc[f_prices_bad.any(axis=1) & f_vol_zero_or_nan, "Volume"] = tag
# If volume=0 or NaN but price moved in interval, then tag volume for repair
f_change = df2["High"].to_numpy() != df2["Low"].to_numpy()
df2.loc[f_change & f_vol_zero_or_nan, "Volume"] = tag
df2.loc[f_zero_or_nan.any(axis=1) & (df2["Volume"]==0), "Volume"] = tag
df2.loc[f_zero_or_nan.any(axis=1) & (df2["Volume"].isna()), "Volume"] = tag
n_before = (df2[data_cols].to_numpy()==tag).sum()
dts_tagged = df2.index[(df2[data_cols].to_numpy()==tag).any(axis=1)]
df2 = self._reconstruct_intervals_batch(df2, interval, prepost, tag, silent)
df2 = self._reconstruct_intervals_batch(df2, interval, tag=tag)
n_after = (df2[data_cols].to_numpy()==tag).sum()
dts_not_repaired = df2.index[(df2[data_cols].to_numpy()==tag).any(axis=1)]
n_fixed = n_before - n_after
if not silent and n_fixed > 0:
msg = f"{self.ticker}: fixed {n_fixed}/{n_before} value=0 errors in {interval} price data"
if n_fixed < 4:
dts_repaired = sorted(list(set(dts_tagged).difference(dts_not_repaired)))
msg += f": {dts_repaired}"
print(msg)
if df2_reserve is not None:
df2 = _pd.concat([df2, df2_reserve])
df2 = df2.sort_index()
if n_fixed > 0:
print("{}: fixed {} price=0.0 errors in {} price data".format(self.ticker, n_fixed, interval))
# Restore original values where repair failed (i.e. remove tag values)
f = df2[data_cols].values==tag
@@ -946,7 +821,7 @@ class TickerBase:
return tz
def _fetch_ticker_tz(self, debug_mode, proxy, timeout):
# Query Yahoo for fast price data just to get returned timezone
# Query Yahoo for basic price data just to get returned timezone
params = {"range": "1d", "interval": "1d"}
@@ -1020,15 +895,6 @@ class TickerBase:
data = self._quote.info
return data
@property
def fast_info(self):
return self._fast_info
@property
def basic_info(self):
print("WARNING: 'Ticker.basic_info' is renamed to 'Ticker.fast_info', hopefully purpose is clearer")
return self.fast_info
def get_sustainability(self, proxy=None, as_dict=False):
self._quote.proxy = proxy
data = self._quote.sustainability
@@ -1294,6 +1160,7 @@ class TickerBase:
shares_data = json_data["timeseries"]["result"]
if not "shares_out" in shares_data[0]:
print(f"{self.ticker}: Yahoo did not return share count in date range {start} -> {end}")
return None
try:
df = _pd.Series(shares_data[0]["shares_out"], index=_pd.to_datetime(shares_data[0]["timestamp"], unit="s"))
@@ -1452,6 +1319,6 @@ class TickerBase:
def get_history_metadata(self) -> dict:
if self._history_metadata is None:
# Request intraday data, because then Yahoo returns exchange schedule.
self.history(period="1wk", interval="1h", prepost=True)
raise RuntimeError("Metadata was never retrieved so far, "
"call history() to retrieve it")
return self._history_metadata

View File

@@ -14,9 +14,6 @@ else:
import requests as requests
import re
from bs4 import BeautifulSoup
import random
import time
from frozendict import frozendict
@@ -49,45 +46,40 @@ def lru_cache_freezeargs(func):
return wrapped
def _extract_extra_keys_from_stores(data):
new_keys = [k for k in data.keys() if k not in ["context", "plugins"]]
new_keys_values = set([data[k] for k in new_keys])
# Maybe multiple keys have same value - keep one of each
new_keys_uniq = []
new_keys_uniq_values = set()
for k in new_keys:
v = data[k]
if not v in new_keys_uniq_values:
new_keys_uniq.append(k)
new_keys_uniq_values.add(v)
return [data[k] for k in new_keys_uniq]
def decrypt_cryptojs_aes_stores(data, keys=None):
def decrypt_cryptojs_aes_stores(data):
encrypted_stores = data['context']['dispatcher']['stores']
password = None
if keys is not None:
if not isinstance(keys, list):
raise TypeError("'keys' must be list")
candidate_passwords = keys
else:
candidate_passwords = []
if "_cs" in data and "_cr" in data:
_cs = data["_cs"]
_cr = data["_cr"]
_cr = b"".join(int.to_bytes(i, length=4, byteorder="big", signed=True) for i in json.loads(_cr)["words"])
password = hashlib.pbkdf2_hmac("sha1", _cs.encode("utf8"), _cr, 1, dklen=32).hex()
else:
# Currently assume one extra key in dict, which is password. Print error if
# more extra keys detected.
new_keys = [k for k in data.keys() if k not in ["context", "plugins"]]
l = len(new_keys)
if l == 0:
return None
elif l == 1 and isinstance(data[new_keys[0]], str):
password_key = new_keys[0]
else:
msg = "Yahoo has again changed data format, yfinance now unsure which key(s) is for decryption:"
k = new_keys[0]
k_str = k if len(k) < 32 else k[:32-3]+"..."
msg += f" '{k_str}'->{type(data[k])}"
for i in range(1, len(new_keys)):
msg += f" , '{k_str}'->{type(data[k])}"
raise Exception(msg)
password_key = new_keys[0]
password = data[password_key]
encrypted_stores = b64decode(encrypted_stores)
assert encrypted_stores[0:8] == b"Salted__"
salt = encrypted_stores[8:16]
encrypted_stores = encrypted_stores[16:]
def _EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5") -> tuple:
def EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5") -> tuple:
"""OpenSSL EVP Key Derivation Function
Args:
password (Union[str, bytes, bytearray]): Password to generate key from.
@@ -126,42 +118,22 @@ def decrypt_cryptojs_aes_stores(data, keys=None):
key, iv = key_iv[:keySize], key_iv[keySize:final_length]
return key, iv
def _decrypt(encrypted_stores, password, key, iv):
if usePycryptodome:
cipher = AES.new(key, AES.MODE_CBC, iv=iv)
plaintext = cipher.decrypt(encrypted_stores)
plaintext = unpad(plaintext, 16, style="pkcs7")
else:
cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
decryptor = cipher.decryptor()
plaintext = decryptor.update(encrypted_stores) + decryptor.finalize()
unpadder = padding.PKCS7(128).unpadder()
plaintext = unpadder.update(plaintext) + unpadder.finalize()
plaintext = plaintext.decode("utf-8")
return plaintext
try:
key, iv = EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5")
except:
raise Exception("yfinance failed to decrypt Yahoo data response")
if not password is None:
try:
key, iv = _EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5")
except:
raise Exception("yfinance failed to decrypt Yahoo data response")
plaintext = _decrypt(encrypted_stores, password, key, iv)
if usePycryptodome:
cipher = AES.new(key, AES.MODE_CBC, iv=iv)
plaintext = cipher.decrypt(encrypted_stores)
plaintext = unpad(plaintext, 16, style="pkcs7")
else:
success = False
for i in range(len(candidate_passwords)):
# print(f"Trying candiate pw {i+1}/{len(candidate_passwords)}")
password = candidate_passwords[i]
try:
key, iv = _EVPKDF(password, salt, keySize=32, ivSize=16, iterations=1, hashAlgorithm="md5")
plaintext = _decrypt(encrypted_stores, password, key, iv)
success = True
break
except:
pass
if not success:
raise Exception("yfinance failed to decrypt Yahoo data response")
cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
decryptor = cipher.decryptor()
plaintext = decryptor.update(encrypted_stores) + decryptor.finalize()
unpadder = padding.PKCS7(128).unpadder()
plaintext = unpadder.update(plaintext) + unpadder.finalize()
plaintext = plaintext.decode("utf-8")
decoded_stores = json.loads(plaintext)
return decoded_stores
@@ -204,72 +176,6 @@ class TickerData:
proxy = {"https": proxy}
return proxy
def get_raw_json(self, url, user_agent_headers=None, params=None, proxy=None, timeout=30):
response = self.get(url, user_agent_headers=user_agent_headers, params=params, proxy=proxy, timeout=timeout)
response.raise_for_status()
return response.json()
def _get_decryption_keys_from_yahoo_js(self, soup):
result = None
key_count = 4
re_script = soup.find("script", string=re.compile("root.App.main")).text
re_data = json.loads(re.search("root.App.main\s+=\s+(\{.*\})", re_script).group(1))
re_data.pop("context", None)
key_list = list(re_data.keys())
if re_data.get("plugins"): # 1) attempt to get last 4 keys after plugins
ind = key_list.index("plugins")
if len(key_list) > ind+1:
sub_keys = key_list[ind+1:]
if len(sub_keys) == key_count:
re_obj = {}
missing_val = False
for k in sub_keys:
if not re_data.get(k):
missing_val = True
break
re_obj.update({k: re_data.get(k)})
if not missing_val:
result = re_obj
if not result is None:
return [''.join(result.values())]
re_keys = [] # 2) attempt scan main.js file approach to get keys
prefix = "https://s.yimg.com/uc/finance/dd-site/js/main."
tags = [tag['src'] for tag in soup.find_all('script') if prefix in tag.get('src', '')]
for t in tags:
response_js = self.cache_get(t)
#
if response_js.status_code != 200:
time.sleep(random.randrange(10, 20))
response_js.close()
else:
r_data = response_js.content.decode("utf8")
re_list = [
x.group() for x in re.finditer(r"context.dispatcher.stores=JSON.parse((?:.*?\r?\n?)*)toString", r_data)
]
for rl in re_list:
re_sublist = [x.group() for x in re.finditer(r"t\[\"((?:.*?\r?\n?)*)\"\]", rl)]
if len(re_sublist) == key_count:
re_keys = [sl.replace('t["', '').replace('"]', '') for sl in re_sublist]
break
response_js.close()
if len(re_keys) == key_count:
break
if len(re_keys) > 0:
re_obj = {}
missing_val = False
for k in re_keys:
if not re_data.get(k):
missing_val = True
break
re_obj.update({k: re_data.get(k)})
if not missing_val:
return [''.join(re_obj.values())]
return []
@lru_cache_freezeargs
@lru_cache(maxsize=cache_maxsize)
def get_json_data_stores(self, sub_page: str = None, proxy=None) -> dict:
@@ -281,8 +187,7 @@ class TickerData:
else:
ticker_url = "{}/{}".format(_SCRAPE_URL_, self.ticker)
response = self.get(url=ticker_url, proxy=proxy)
html = response.text
html = self.get(url=ticker_url, proxy=proxy).text
# The actual json-data for stores is in a javascript assignment in the webpage
try:
@@ -294,28 +199,7 @@ class TickerData:
data = json.loads(json_str)
# Gather decryption keys:
soup = BeautifulSoup(response.content, "html.parser")
keys = self._get_decryption_keys_from_yahoo_js(soup)
# if len(keys) == 0:
# msg = "No decryption keys could be extracted from JS file."
# if "requests_cache" in str(type(response)):
# msg += " Try flushing your 'requests_cache', probably parsing old JS."
# print("WARNING: " + msg + " Falling back to backup decrypt methods.")
if len(keys) == 0:
keys = []
try:
extra_keys = _extract_extra_keys_from_stores(data)
keys = [''.join(extra_keys[-4:])]
except:
pass
#
keys_url = "https://github.com/ranaroussi/yfinance/raw/main/yfinance/scrapers/yahoo-keys.txt"
response_gh = self.cache_get(keys_url)
keys += response_gh.text.splitlines()
# Decrypt!
stores = decrypt_cryptojs_aes_stores(data, keys)
stores = decrypt_cryptojs_aes_stores(data)
if stores is None:
# Maybe Yahoo returned old format, not encrypted
if "context" in data and "dispatcher" in data["context"]:

View File

@@ -29,7 +29,7 @@ from . import Ticker, utils
from . import shared
def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=None,
def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=False,
group_by='column', auto_adjust=False, back_adjust=False, repair=False, keepna=False,
progress=True, period="max", show_errors=True, interval="1d", prepost=False,
proxy=None, rounding=False, timeout=10):
@@ -44,13 +44,11 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
Intraday data cannot extend last 60 days
start: str
Download start date string (YYYY-MM-DD) or _datetime, inclusive.
Download start date string (YYYY-MM-DD) or _datetime.
Default is 1900-01-01
E.g. for start="2020-01-01", the first data point will be on "2020-01-01"
end: str
Download end date string (YYYY-MM-DD) or _datetime, exclusive.
Download end date string (YYYY-MM-DD) or _datetime.
Default is now
E.g. for end="2023-01-01", the last data point will be on "2022-12-31"
group_by : str
Group by 'ticker' or 'column' (default)
prepost : bool
@@ -70,7 +68,7 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
How many threads to use for mass downloading. Default is True
ignore_tz: bool
When combining from different timezones, ignore that part of datetime.
Default depends on interval. Intraday = False. Day+ = True.
Default is False
proxy: str
Optional. Proxy server URL scheme. Default is None
rounding: bool
@@ -82,14 +80,6 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
seconds. (Can also be a fraction of a second e.g. 0.01)
"""
if ignore_tz is None:
# Set default value depending on interval
if interval[1:] in ['m', 'h']:
# Intraday
ignore_tz = False
else:
ignore_tz = True
# create ticker list
tickers = tickers if isinstance(
tickers, (list, set, tuple)) else tickers.replace(',', ' ').split()

View File

@@ -7,530 +7,6 @@ from yfinance import utils
from yfinance.data import TickerData
info_retired_keys_price = {"currentPrice", "dayHigh", "dayLow", "open", "previousClose", "volume", "volume24Hr"}
info_retired_keys_price.update({"regularMarket"+s for s in ["DayHigh", "DayLow", "Open", "PreviousClose", "Price", "Volume"]})
info_retired_keys_price.update({"fiftyTwoWeekLow", "fiftyTwoWeekHigh", "fiftyTwoWeekChange", "52WeekChange", "fiftyDayAverage", "twoHundredDayAverage"})
info_retired_keys_price.update({"averageDailyVolume10Day", "averageVolume10days", "averageVolume"})
info_retired_keys_exchange = {"currency", "exchange", "exchangeTimezoneName", "exchangeTimezoneShortName", "quoteType"}
info_retired_keys_marketCap = {"marketCap"}
info_retired_keys_symbol = {"symbol"}
info_retired_keys = info_retired_keys_price | info_retired_keys_exchange | info_retired_keys_marketCap | info_retired_keys_symbol
PRUNE_INFO = True
# PRUNE_INFO = False
_BASIC_URL_ = "https://query2.finance.yahoo.com/v10/finance/quoteSummary"
from collections.abc import MutableMapping
class InfoDictWrapper(MutableMapping):
""" Simple wrapper around info dict, intercepting 'gets' to
print how-to-migrate messages for specific keys. Requires
override dict API"""
def __init__(self, info):
self.info = info
def keys(self):
return self.info.keys()
def __str__(self):
return self.info.__str__()
def __repr__(self):
return self.info.__repr__()
def __contains__(self, k):
return k in self.info.keys()
def __getitem__(self, k):
if k in info_retired_keys_price:
print(f"Price data removed from info (key='{k}'). Use Ticker.fast_info or history() instead")
return None
elif k in info_retired_keys_exchange:
print(f"Exchange data removed from info (key='{k}'). Use Ticker.fast_info or Ticker.get_history_metadata() instead")
return None
elif k in info_retired_keys_marketCap:
print(f"Market cap removed from info (key='{k}'). Use Ticker.fast_info instead")
return None
elif k in info_retired_keys_symbol:
print(f"Symbol removed from info (key='{k}'). You know this already")
return None
return self.info[self._keytransform(k)]
def __setitem__(self, k, value):
self.info[self._keytransform(k)] = value
def __delitem__(self, k):
del self.info[self._keytransform(k)]
def __iter__(self):
return iter(self.info)
def __len__(self):
return len(self.info)
def _keytransform(self, k):
return k
class FastInfo:
# Contain small subset of info[] items that can be fetched faster elsewhere.
# Imitates a dict.
def __init__(self, tickerBaseObject):
utils.print_once("Note: 'info' dict is now fixed & improved, 'fast_info' no longer faster")
self._tkr = tickerBaseObject
self._prices_1y = None
self._prices_1wk_1h_prepost = None
self._prices_1wk_1h_reg = None
self._md = None
self._currency = None
self._quote_type = None
self._exchange = None
self._timezone = None
self._shares = None
self._mcap = None
self._open = None
self._day_high = None
self._day_low = None
self._last_price = None
self._last_volume = None
self._prev_close = None
self._reg_prev_close = None
self._50d_day_average = None
self._200d_day_average = None
self._year_high = None
self._year_low = None
self._year_change = None
self._10d_avg_vol = None
self._3mo_avg_vol = None
# attrs = utils.attributes(self)
# self.keys = attrs.keys()
# utils.attributes is calling each method, bad! Have to hardcode
_properties = ["currency", "quote_type", "exchange", "timezone"]
_properties += ["shares", "market_cap"]
_properties += ["last_price", "previous_close", "open", "day_high", "day_low"]
_properties += ["regular_market_previous_close"]
_properties += ["last_volume"]
_properties += ["fifty_day_average", "two_hundred_day_average", "ten_day_average_volume", "three_month_average_volume"]
_properties += ["year_high", "year_low", "year_change"]
# Because released before fixing key case, need to officially support
# camel-case but also secretly support snake-case
base_keys = [k for k in _properties if not '_' in k]
sc_keys = [k for k in _properties if '_' in k]
self._sc_to_cc_key = {k:utils.snake_case_2_camelCase(k) for k in sc_keys}
self._cc_to_sc_key = {v:k for k,v in self._sc_to_cc_key.items()}
self._public_keys = sorted(base_keys + list(self._sc_to_cc_key.values()))
self._keys = sorted(self._public_keys + sc_keys)
# dict imitation:
def keys(self):
return self._public_keys
def items(self):
return [(k,self[k]) for k in self._public_keys]
def values(self):
return [self[k] for k in self._public_keys]
def get(self, key, default=None):
if key in self.keys():
if key in self._cc_to_sc_key:
key = self._cc_to_sc_key[key]
return self[key]
return default
def __getitem__(self, k):
if not isinstance(k, str):
raise KeyError(f"key must be a string")
if not k in self._keys:
raise KeyError(f"'{k}' not valid key. Examine 'FastInfo.keys()'")
if k in self._cc_to_sc_key:
k = self._cc_to_sc_key[k]
return getattr(self, k)
def __contains__(self, k):
return k in self.keys()
def __iter__(self):
return iter(self.keys())
def __str__(self):
return "lazy-loading dict with keys = " + str(self.keys())
def __repr__(self):
return self.__str__()
def toJSON(self, indent=4):
d = {k:self[k] for k in self.keys()}
return _json.dumps({k:self[k] for k in self.keys()}, indent=indent)
def _get_1y_prices(self, fullDaysOnly=False):
if self._prices_1y is None:
self._prices_1y = self._tkr.history(period="380d", auto_adjust=False, debug=False, keepna=True)
self._md = self._tkr.get_history_metadata()
try:
ctp = self._md["currentTradingPeriod"]
self._today_open = pd.to_datetime(ctp["regular"]["start"], unit='s', utc=True).tz_convert(self.timezone)
self._today_close = pd.to_datetime(ctp["regular"]["end"], unit='s', utc=True).tz_convert(self.timezone)
self._today_midnight = self._today_close.ceil("D")
except:
self._today_open = None
self._today_close = None
self._today_midnight = None
raise
if self._prices_1y.empty:
return self._prices_1y
dnow = pd.Timestamp.utcnow().tz_convert(self.timezone).date()
d1 = dnow
d0 = (d1 + datetime.timedelta(days=1)) - utils._interval_to_timedelta("1y")
if fullDaysOnly and self._exchange_open_now():
# Exclude today
d1 -= utils._interval_to_timedelta("1d")
return self._prices_1y.loc[str(d0):str(d1)]
def _get_1wk_1h_prepost_prices(self):
if self._prices_1wk_1h_prepost is None:
self._prices_1wk_1h_prepost = self._tkr.history(period="1wk", interval="1h", auto_adjust=False, prepost=True, debug=False)
return self._prices_1wk_1h_prepost
def _get_1wk_1h_reg_prices(self):
if self._prices_1wk_1h_reg is None:
self._prices_1wk_1h_reg = self._tkr.history(period="1wk", interval="1h", auto_adjust=False, prepost=False, debug=False)
return self._prices_1wk_1h_reg
def _get_exchange_metadata(self):
if self._md is not None:
return self._md
self._get_1y_prices()
self._md = self._tkr.get_history_metadata()
return self._md
def _exchange_open_now(self):
t = pd.Timestamp.utcnow()
self._get_exchange_metadata()
# if self._today_open is None and self._today_close is None:
# r = False
# else:
# r = self._today_open <= t and t < self._today_close
# if self._today_midnight is None:
# r = False
# elif self._today_midnight.date() > t.tz_convert(self.timezone).date():
# r = False
# else:
# r = t < self._today_midnight
last_day_cutoff = self._get_1y_prices().index[-1] + datetime.timedelta(days=1)
last_day_cutoff += datetime.timedelta(minutes=20)
r = t < last_day_cutoff
# print("_exchange_open_now() returning", r)
return r
@property
def currency(self):
if self._currency is not None:
return self._currency
if self._tkr._history_metadata is None:
self._get_1y_prices()
md = self._tkr.get_history_metadata()
self._currency = md["currency"]
return self._currency
@property
def quote_type(self):
if self._quote_type is not None:
return self._quote_type
if self._tkr._history_metadata is None:
self._get_1y_prices()
md = self._tkr.get_history_metadata()
self._quote_type = md["instrumentType"]
return self._quote_type
@property
def exchange(self):
if self._exchange is not None:
return self._exchange
self._exchange = self._get_exchange_metadata()["exchangeName"]
return self._exchange
@property
def timezone(self):
if self._timezone is not None:
return self._timezone
self._timezone = self._get_exchange_metadata()["exchangeTimezoneName"]
return self._timezone
@property
def shares(self):
if self._shares is not None:
return self._shares
shares = self._tkr.get_shares_full(start=pd.Timestamp.utcnow().date()-pd.Timedelta(days=548))
if shares is None:
# Requesting 18 months failed, so fallback to shares which should include last year
shares = self._tkr.get_shares()
if shares is not None:
if isinstance(shares, pd.DataFrame):
shares = shares[shares.columns[0]]
self._shares = int(shares.iloc[-1])
return self._shares
@property
def last_price(self):
if self._last_price is not None:
return self._last_price
prices = self._get_1y_prices()
if prices.empty:
md = self._get_exchange_metadata()
if "regularMarketPrice" in md:
self._last_price = md["regularMarketPrice"]
else:
self._last_price = float(prices["Close"].iloc[-1])
if _np.isnan(self._last_price):
md = self._get_exchange_metadata()
if "regularMarketPrice" in md:
self._last_price = md["regularMarketPrice"]
return self._last_price
@property
def previous_close(self):
if self._prev_close is not None:
return self._prev_close
prices = self._get_1wk_1h_prepost_prices()
fail = False
if prices.empty:
fail = True
else:
prices = prices[["Close"]].groupby(prices.index.date).last()
if prices.shape[0] < 2:
# Very few symbols have previousClose despite no
# no trading data e.g. 'QCSTIX'.
fail = True
else:
self._prev_close = float(prices["Close"].iloc[-2])
if fail:
# Fallback to original info[] if available.
self._tkr.info # trigger fetch
k = "previousClose"
if self._tkr._quote._retired_info is not None and k in self._tkr._quote._retired_info:
self._prev_close = self._tkr._quote._retired_info[k]
return self._prev_close
@property
def regular_market_previous_close(self):
if self._reg_prev_close is not None:
return self._reg_prev_close
prices = self._get_1y_prices()
if prices.shape[0] == 1:
# Tiny % of tickers don't return daily history before last trading day,
# so backup option is hourly history:
prices = self._get_1wk_1h_reg_prices()
prices = prices[["Close"]].groupby(prices.index.date).last()
if prices.shape[0] < 2:
# Very few symbols have regularMarketPreviousClose despite no
# no trading data. E.g. 'QCSTIX'.
# So fallback to original info[] if available.
self._tkr.info # trigger fetch
k = "regularMarketPreviousClose"
if self._tkr._quote._retired_info is not None and k in self._tkr._quote._retired_info:
self._reg_prev_close = self._tkr._quote._retired_info[k]
else:
self._reg_prev_close = float(prices["Close"].iloc[-2])
return self._reg_prev_close
@property
def open(self):
if self._open is not None:
return self._open
prices = self._get_1y_prices()
if prices.empty:
self._open = None
else:
self._open = float(prices["Open"].iloc[-1])
if _np.isnan(self._open):
self._open = None
return self._open
@property
def day_high(self):
if self._day_high is not None:
return self._day_high
prices = self._get_1y_prices()
if prices.empty:
self._day_high = None
else:
self._day_high = float(prices["High"].iloc[-1])
if _np.isnan(self._day_high):
self._day_high = None
return self._day_high
@property
def day_low(self):
if self._day_low is not None:
return self._day_low
prices = self._get_1y_prices()
if prices.empty:
self._day_low = None
else:
self._day_low = float(prices["Low"].iloc[-1])
if _np.isnan(self._day_low):
self._day_low = None
return self._day_low
@property
def last_volume(self):
if self._last_volume is not None:
return self._last_volume
prices = self._get_1y_prices()
self._last_volume = None if prices.empty else int(prices["Volume"].iloc[-1])
return self._last_volume
@property
def fifty_day_average(self):
if self._50d_day_average is not None:
return self._50d_day_average
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.empty:
self._50d_day_average = None
else:
n = prices.shape[0]
a = n-50
b = n
if a < 0:
a = 0
self._50d_day_average = float(prices["Close"].iloc[a:b].mean())
return self._50d_day_average
@property
def two_hundred_day_average(self):
if self._200d_day_average is not None:
return self._200d_day_average
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.empty:
self._200d_day_average = None
else:
n = prices.shape[0]
a = n-200
b = n
if a < 0:
a = 0
self._200d_day_average = float(prices["Close"].iloc[a:b].mean())
return self._200d_day_average
@property
def ten_day_average_volume(self):
if self._10d_avg_vol is not None:
return self._10d_avg_vol
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.empty:
self._10d_avg_vol = None
else:
n = prices.shape[0]
a = n-10
b = n
if a < 0:
a = 0
self._10d_avg_vol = int(prices["Volume"].iloc[a:b].mean())
return self._10d_avg_vol
@property
def three_month_average_volume(self):
if self._3mo_avg_vol is not None:
return self._3mo_avg_vol
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.empty:
self._3mo_avg_vol = None
else:
dt1 = prices.index[-1]
dt0 = dt1 - utils._interval_to_timedelta("3mo") + utils._interval_to_timedelta("1d")
self._3mo_avg_vol = int(prices.loc[dt0:dt1, "Volume"].mean())
return self._3mo_avg_vol
@property
def year_high(self):
if self._year_high is not None:
return self._year_high
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.empty:
prices = self._get_1y_prices(fullDaysOnly=False)
self._year_high = float(prices["High"].max())
return self._year_high
@property
def year_low(self):
if self._year_low is not None:
return self._year_low
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.empty:
prices = self._get_1y_prices(fullDaysOnly=False)
self._year_low = float(prices["Low"].min())
return self._year_low
@property
def year_change(self):
if self._year_change is not None:
return self._year_change
prices = self._get_1y_prices(fullDaysOnly=True)
if prices.shape[0] >= 2:
self._year_change = (prices["Close"].iloc[-1] - prices["Close"].iloc[0]) / prices["Close"].iloc[0]
self._year_change = float(self._year_change)
return self._year_change
@property
def market_cap(self):
if self._mcap is not None:
return self._mcap
try:
shares = self.shares
except Exception as e:
if "Cannot retrieve share count" in str(e):
shares = None
else:
raise
if shares is None:
# Very few symbols have marketCap despite no share count.
# E.g. 'BTC-USD'
# So fallback to original info[] if available.
self._tkr.info
k = "marketCap"
if self._tkr._quote._retired_info is not None and k in self._tkr._quote._retired_info:
self._mcap = self._tkr._quote._retired_info[k]
else:
self._mcap = float(shares * self.last_price)
return self._mcap
class Quote:
def __init__(self, data: TickerData, proxy=None):
@@ -538,22 +14,18 @@ class Quote:
self.proxy = proxy
self._info = None
self._retired_info = None
self._sustainability = None
self._recommendations = None
self._calendar = None
self._already_scraped = False
self._already_fetched = False
self._already_fetched_complementary = False
self._already_scraped_complementary = False
@property
def info(self) -> dict:
if self._info is None:
# self._scrape(self.proxy) # decrypt broken
self._fetch(self.proxy)
self._fetch_complementary(self.proxy)
self._scrape(self.proxy)
self._scrape_complementary(self.proxy)
return self._info
@@ -658,19 +130,6 @@ class Quote:
except Exception:
pass
# Delete redundant info[] keys, because values can be accessed faster
# elsewhere - e.g. price keys. Hope is reduces Yahoo spam effect.
# But record the dropped keys, because in rare cases they are needed.
self._retired_info = {}
for k in info_retired_keys:
if k in self._info:
self._retired_info[k] = self._info[k]
if PRUNE_INFO:
del self._info[k]
if PRUNE_INFO:
# InfoDictWrapper will explain how to access above data elsewhere
self._info = InfoDictWrapper(self._info)
# events
try:
cal = pd.DataFrame(quote_summary_store['calendarEvents']['earnings'])
@@ -696,56 +155,12 @@ class Quote:
except Exception:
pass
def _fetch(self, proxy):
if self._already_fetched:
def _scrape_complementary(self, proxy):
if self._already_scraped_complementary:
return
self._already_fetched = True
modules = ['summaryProfile', 'financialData', 'quoteType',
'defaultKeyStatistics', 'assetProfile', 'summaryDetail']
result = self._data.get_raw_json(
_BASIC_URL_ + f"/{self._data.ticker}", params={"modules": ",".join(modules), "ssl": "true"}, proxy=proxy
)
result["quoteSummary"]["result"][0]["symbol"] = self._data.ticker
query1_info = next(
(info for info in result.get("quoteSummary", {}).get("result", []) if info["symbol"] == self._data.ticker),
None,
)
# Most keys that appear in multiple dicts have same value. Except 'maxAge' because
# Yahoo not consistent with days vs seconds. Fix it here:
for k in query1_info:
if "maxAge" in query1_info[k] and query1_info[k]["maxAge"] == 1:
query1_info[k]["maxAge"] = 86400
query1_info = {
k1: v1
for k, v in query1_info.items()
if isinstance(v, dict)
for k1, v1 in v.items()
if v1
}
# recursively format but only because of 'companyOfficers'
def _format(k, v):
if isinstance(v, dict) and "raw" in v and "fmt" in v:
v2 = v["fmt"] if k in {"regularMarketTime", "postMarketTime"} else v["raw"]
elif isinstance(v, list):
v2 = [_format(None, x) for x in v]
elif isinstance(v, dict):
v2 = {k:_format(k, x) for k, x in v.items()}
elif isinstance(v, str):
v2 = v.replace("\xa0", " ")
else:
v2 = v
return v2
for k, v in query1_info.items():
query1_info[k] = _format(k, v)
self._info = query1_info
self._already_scraped_complementary = True
def _fetch_complementary(self, proxy):
if self._already_fetched_complementary:
return
self._already_fetched_complementary = True
# self._scrape(proxy) # decrypt broken
self._fetch(proxy)
self._scrape(proxy)
if self._info is None:
return
@@ -787,14 +202,11 @@ class Quote:
json_str = self._data.cache_get(url=url, proxy=proxy).text
json_data = json.loads(json_str)
try:
key_stats = json_data["timeseries"]["result"][0]
if k not in key_stats:
# Yahoo website prints N/A, indicates Yahoo lacks necessary data to calculate
v = None
else:
# Select most recent (last) raw value in list:
v = key_stats[k][-1]["reportedValue"]["raw"]
except Exception:
key_stats = json_data["timeseries"]["result"][0]
if k not in key_stats:
# Yahoo website prints N/A, indicates Yahoo lacks necessary data to calculate
v = None
else:
# Select most recent (last) raw value in list:
v = key_stats[k][-1]["reportedValue"]["raw"]
self._info[k] = v

View File

@@ -1,8 +0,0 @@
daf93e37cbf219cd4c1f3f74ec4551265ec5565b99e8c9322dccd6872941cf13c818cbb88cba6f530e643b4e2329b17ec7161f4502ce6a02bb0dbbe5fc0d0474
ad4d90b3c9f2e1d156ef98eadfa0ff93e4042f6960e54aa2a13f06f528e6b50ba4265a26a1fd5b9cd3db0d268a9c34e1d080592424309429a58bce4adc893c87
e9a8ab8e5620b712ebc2fb4f33d5c8b9c80c0d07e8c371911c785cf674789f1747d76a909510158a7b7419e86857f2d7abbd777813ff64840e4cbc514d12bcae
6ae2523aeafa283dad746556540145bf603f44edbf37ad404d3766a8420bb5eb1d3738f52a227b88283cca9cae44060d5f0bba84b6a495082589f5fe7acbdc9e
3365117c2a368ffa5df7313a4a84988f73926a86358e8eea9497c5ff799ce27d104b68e5f2fbffa6f8f92c1fef41765a7066fa6bcf050810a9c4c7872fd3ebf0
15d8f57919857d5a5358d2082c7ef0f1129cfacd2a6480333dcfb954b7bb67d820abefebfdb0eaa6ef18a1c57f617b67d7e7b0ec040403b889630ae5db5a4dbb
db9630d707a7d0953ac795cd8db1ca9ca6c9d8239197cdfda24b4e0ec9c37eaec4db82dab68b8f606ab7b5b4af3e65dab50606f8cf508269ec927e6ee605fb78
3c895fb5ddcc37d20d3073ed74ee3efad59bcb147c8e80fd279f83701b74b092d503dcd399604c6d8be8f3013429d3c2c76ed5b31b80c9df92d5eab6d3339fce

View File

@@ -34,8 +34,12 @@ class Tickers:
tickers = tickers if isinstance(
tickers, list) else tickers.replace(',', ' ').split()
self.symbols = [ticker.upper() for ticker in tickers]
self.tickers = {ticker:Ticker(ticker, session=session) for ticker in self.symbols}
ticker_objects = {}
for ticker in self.symbols:
ticker_objects[ticker] = Ticker(ticker, session=session)
self.tickers = ticker_objects
# self.tickers = _namedtuple(
# "Tickers", ticker_objects.keys(), rename=True
# )(*ticker_objects.values())

View File

@@ -35,7 +35,6 @@ import os as _os
import appdirs as _ad
import sqlite3 as _sqlite3
import atexit as _atexit
from functools import lru_cache
from threading import Lock
@@ -50,25 +49,6 @@ user_agent_headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
# From https://stackoverflow.com/a/59128615
from types import FunctionType
from inspect import getmembers
def attributes(obj):
disallowed_names = {
name for name, value in getmembers(type(obj))
if isinstance(value, FunctionType)}
return {
name: getattr(obj, name) for name in dir(obj)
if name[0] != '_' and name not in disallowed_names and hasattr(obj, name)}
@lru_cache(maxsize=20)
def print_once(msg):
# 'warnings' module suppression of repeat messages does not work.
# This function replicates correct behaviour
print(msg)
def is_isin(string):
return bool(_re.match("^([A-Z]{2})([A-Z0-9]{9})([0-9]{1})$", string))
@@ -308,35 +288,35 @@ def camel2title(strings: List[str], sep: str = ' ', acronyms: Optional[List[str]
return strings
def snake_case_2_camelCase(s):
sc = s.split('_')[0] + ''.join(x.title() for x in s.split('_')[1:])
return sc
def _parse_user_dt(dt, exchange_tz):
def _parse_user_dt(dt, tz_name):
if isinstance(dt, int):
# Should already be epoch, test with conversion:
_datetime.datetime.fromtimestamp(dt)
else:
# Convert str/date -> datetime, set tzinfo=exchange, get timestamp:
if isinstance(dt, str):
dt = _datetime.datetime.strptime(str(dt), '%Y-%m-%d')
if isinstance(dt, _datetime.date) and not isinstance(dt, _datetime.datetime):
dt = _datetime.datetime.combine(dt, _datetime.time(0))
if isinstance(dt, _datetime.datetime) and dt.tzinfo is None:
# Assume user is referring to exchange's timezone
dt = _tz.timezone(exchange_tz).localize(dt)
dt = int(dt.timestamp())
return dt
try:
_datetime.datetime.fromtimestamp(dt)
except:
raise Exception(f"'dt' is not a valid epoch: '{dt}'")
return dt
# Convert str/date -> datetime, set tzinfo=exchange, get timestamp:
dt2 = dt
if isinstance(dt2, str):
dt2 = _datetime.datetime.strptime(str(dt2), '%Y-%m-%d')
if isinstance(dt2, _datetime.date) and not isinstance(dt2, _datetime.datetime):
dt2 = _datetime.datetime.combine(dt2, _datetime.time(0))
if isinstance(dt2, _datetime.datetime) and dt2.tzinfo is None:
# Assume user is referring to exchange's timezone
if tz_name is None:
raise Exception(f"Must provide a timezone for localizing '{dt}'")
dt2 = _tz.timezone(tz_name).localize(dt2)
if not isinstance(dt2, _datetime.datetime):
raise Exception(f"'dt' is not a date-like value: '{dt}'")
dt2 = int(dt2.timestamp())
return dt2
def _interval_to_timedelta(interval):
if interval == "1mo":
return _dateutil.relativedelta.relativedelta(months=1)
elif interval == "3mo":
return _dateutil.relativedelta.relativedelta(months=3)
elif interval == "1y":
return _dateutil.relativedelta.relativedelta(years=1)
return _dateutil.relativedelta(months=1)
elif interval == "1wk":
return _pd.Timedelta(days=7, unit='d')
else:
@@ -456,35 +436,6 @@ def set_df_tz(df, interval, tz):
return df
def fix_Yahoo_returning_prepost_unrequested(quotes, interval, metadata):
# Sometimes Yahoo returns post-market data despite not requesting it.
# Normally happens on half-day early closes.
#
# And sometimes returns pre-market data despite not requesting it.
# E.g. some London tickers.
tps_df = metadata["tradingPeriods"]
tps_df["_date"] = tps_df.index.date
quotes["_date"] = quotes.index.date
idx = quotes.index.copy()
quotes = quotes.merge(tps_df, how="left", validate="many_to_one")
quotes.index = idx
# "end" = end of regular trading hours (including any auction)
f_drop = quotes.index >= quotes["end"]
f_drop = f_drop | (quotes.index < quotes["start"])
if f_drop.any():
# When printing report, ignore rows that were already NaNs:
f_na = quotes[["Open","Close"]].isna().all(axis=1)
n_nna = quotes.shape[0] - _np.sum(f_na)
n_drop_nna = _np.sum(f_drop & ~f_na)
quotes_dropped = quotes[f_drop]
# if debug and n_drop_nna > 0:
# print(f"Dropping {n_drop_nna}/{n_nna} intervals for falling outside regular trading hours")
quotes = quotes[~f_drop]
metadata["tradingPeriods"] = tps_df.drop(["_date"], axis=1)
quotes = quotes.drop(["_date", "start", "end"], axis=1)
return quotes
def fix_Yahoo_returning_live_separate(quotes, interval, tz_exchange):
# Yahoo bug fix. If market is open today then Yahoo normally returns
# todays data as a separate row from rest-of week/month interval in above row.
@@ -560,7 +511,7 @@ def safe_merge_dfs(df_main, df_sub, interval):
df["_NewIndex"] = new_index
# Duplicates present within periods but can aggregate
if data_col_name in ["Dividends", "Capital Gains"]:
if data_col_name == "Dividends":
# Add
df = df.groupby("_NewIndex").sum()
df.index.name = None
@@ -698,71 +649,6 @@ def is_valid_timezone(tz: str) -> bool:
return True
def format_history_metadata(md):
if not isinstance(md, dict):
return md
if len(md) == 0:
return md
tz = md["exchangeTimezoneName"]
for k in ["firstTradeDate", "regularMarketTime"]:
if k in md and md[k] is not None:
md[k] = _pd.to_datetime(md[k], unit='s', utc=True).tz_convert(tz)
if "currentTradingPeriod" in md:
for m in ["regular", "pre", "post"]:
if m in md["currentTradingPeriod"]:
for t in ["start", "end"]:
md["currentTradingPeriod"][m][t] = \
_pd.to_datetime(md["currentTradingPeriod"][m][t], unit='s', utc=True).tz_convert(tz)
del md["currentTradingPeriod"][m]["gmtoffset"]
del md["currentTradingPeriod"][m]["timezone"]
if "tradingPeriods" in md:
if md["tradingPeriods"] == {"pre":[], "post":[]}:
del md["tradingPeriods"]
if "tradingPeriods" in md:
tps = md["tradingPeriods"]
if isinstance(tps, list):
# Only regular times
regs_dict = [tps[i][0] for i in range(len(tps))]
pres_dict = None
posts_dict = None
elif isinstance(tps, dict):
# Includes pre- and post-market
pres_dict = [tps["pre"][i][0] for i in range(len(tps["pre"]))]
posts_dict = [tps["post"][i][0] for i in range(len(tps["post"]))]
regs_dict = [tps["regular"][i][0] for i in range(len(tps["regular"]))]
else:
raise Exception()
def _dict_to_table(d):
df = _pd.DataFrame.from_dict(d).drop(["timezone", "gmtoffset"], axis=1)
df["end"] = _pd.to_datetime(df["end"], unit='s', utc=True).dt.tz_convert(tz)
df["start"] = _pd.to_datetime(df["start"], unit='s', utc=True).dt.tz_convert(tz)
df.index = _pd.to_datetime(df["start"].dt.date)
df.index = df.index.tz_localize(tz)
return df
df = _dict_to_table(regs_dict)
df_cols = ["start", "end"]
if pres_dict is not None:
pre_df = _dict_to_table(pres_dict)
df = df.merge(pre_df.rename(columns={"start":"pre_start", "end":"pre_end"}), left_index=True, right_index=True)
df_cols = ["pre_start", "pre_end"]+df_cols
if posts_dict is not None:
post_df = _dict_to_table(posts_dict)
df = df.merge(post_df.rename(columns={"start":"post_start", "end":"post_end"}), left_index=True, right_index=True)
df_cols = df_cols+["post_start", "post_end"]
df = df[df_cols]
df.index.name = "Date"
md["tradingPeriods"] = df
return md
class ProgressBar:
def __init__(self, iterations, text='completed'):
self.text = text
@@ -825,14 +711,7 @@ class _KVStore:
with self._cache_mutex:
self.conn = _sqlite3.connect(filename, timeout=10, check_same_thread=False)
self.conn.execute('pragma journal_mode=wal')
try:
self.conn.execute('create table if not exists "kv" (key TEXT primary key, value TEXT) without rowid')
except Exception as e:
if 'near "without": syntax error' in str(e):
# "without rowid" requires sqlite 3.8.2. Older versions will raise exception
self.conn.execute('create table if not exists "kv" (key TEXT primary key, value TEXT)')
else:
raise
self.conn.execute('create table if not exists "kv" (key TEXT primary key, value TEXT) without rowid')
self.conn.commit()
_atexit.register(self.close)

View File

@@ -1 +1 @@
version = "0.2.15"
version = "0.2.3"